Haoyuan Shi
- Papers
- 7
Cite
Notes
Only stored in your browser.
Authored papers
7Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models
arXiv 2025
AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation
arXiv 2025
VideoVista-CulturalLingo: 360$^\circ$ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension
arXiv 2025
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
arXiv 2025
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
arXiv 2025
Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
arXiv 2024
Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment
arXiv 2024
Affiliations
Frequent co-authors
10from 7 papers