Shoubin Yu

Papers: 13

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

13papers

Authored papers

VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting

arXiv 2026

2026

When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning

arXiv 2026

2026

Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos

arXiv 2026

2026

SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models

arXiv 2025

2025

4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time

arXiv 2025

2025

Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization

arXiv 2025

2025

MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation

arXiv 2025

2025

SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation

arXiv 2024

2024

CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion

arXiv 2024

2024

RACCooN: A Versatile Instructional Video Editing Framework with Auto-Generated Narratives

arXiv 2024

2024

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

arXiv 2024

2024

A Simple LLM Framework for Long-Range Video Question-Answering

arXiv 2023

2023

Self-Chained Image-Language Model for Video Localization and Question Answering

self-chained-image-language-model-for-video

2023

Affiliations

No known affiliations.

Frequent co-authors

from 13 papers

Mohit Bansal

Jaehong Yoon

Yue Zhang

Huaxiu Yao

Jaemin Cho

Jialu Li

Ziyang Wang

Zun Wang

Andong Deng

Antoine Yang