Yunxin Li
- Papers
- 16
Cite
Notes
Only stored in your browser.
Authored papers
16WindowsWorld: A Process-Centric Benchmark of Autonomous GUI Agents in Professional Cross-Application Environments
arXiv 2026
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models
arXiv 2025
AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation
arXiv 2025
VideoVista-CulturalLingo: 360$^\circ$ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension
arXiv 2025
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
arXiv 2025
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
arXiv 2025
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization
arXiv 2025
Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents
arXiv 2025
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
arXiv 2024
Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
arXiv 2024
Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment
arXiv 2024
A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation
arXiv 2024
A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text
arXiv 2023
LMEye: An Interactive Perception Network for Large Language Models
arXiv 2023
A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues
arXiv 2023
Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs
arXiv 2023
Affiliations
Frequent co-authors
10from 16 papers