Shoubin Yu
- Papers
- 13
Cite
Notes
Only stored in your browser.
Authored papers
13When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning
arXiv 2026
Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos
arXiv 2026
VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting
arXiv 2026
4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time
arXiv 2025
Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization
arXiv 2025
MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation
arXiv 2025
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
arXiv 2025
SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
arXiv 2024
RACCooN: A Versatile Instructional Video Editing Framework with Auto-Generated Narratives
arXiv 2024
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
arXiv 2024
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
arXiv 2024
A Simple LLM Framework for Long-Range Video Question-Answering
arXiv 2023
Self-Chained Image-Language Model for Video Localization and Question Answering
self-chained-image-language-model-for-video
Affiliations
Frequent co-authors
10from 13 papers