Yan Shu
- Papers
- 10
Cite
Notes
Only stored in your browser.
Authored papers
10Qwen-Image-VAE-2.0 Technical Report
arXiv 2026
TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation
arXiv 2026
EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models
arXiv 2025
Visual Text Processing: A Comprehensive Review and Unified Evaluation
arXiv 2025
When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
arXiv 2025
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding
arXiv 2025
VidText: Towards Comprehensive Evaluation for Video Text Understanding
arXiv 2025
Video-BrowseComp: Benchmarking Agentic Video Research on Open Web
arXiv 2025
MLVU: Benchmarking Multi-task Long Video Understanding
CVPR 2025 1
TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control
arXiv 2024
Affiliations
Frequent co-authors
10from 10 papers