Haoyuan Shi

Papers: 7

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

7papers

Authored papers

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

arXiv 2025

2025

AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation

arXiv 2025

2025

VideoVista-CulturalLingo: 360$^\circ$ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension

arXiv 2025

2025

Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data

arXiv 2025

2025

UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

arXiv 2025

2025

Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation

arXiv 2024

2024

Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment

arXiv 2024

2024

Affiliations

No known affiliations.

Frequent co-authors

from 7 papers

Baotian Hu

Min Zhang

Yunxin Li

Xinyu Chen

Longyue Wang

Shenyuan Jiang

Xuanyu Zhang

Zhenyu Liu

Wenhan Luo

Zhenran Xu