Baotian Hu
- Papers
- 32
Cite
Notes
Only stored in your browser.
Authored papers
32WindowsWorld: A Process-Centric Benchmark of Autonomous GUI Agents in Professional Cross-Application Environments
arXiv 2026
LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding
arXiv 2026
LMEB: Long-horizon Memory Embedding Benchmark
arXiv 2026
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
arXiv 2025
FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces
arXiv 2025
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models
arXiv 2025
AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation
arXiv 2025
VideoVista-CulturalLingo: 360$^\circ$ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension
arXiv 2025
A Unified Agentic Framework for Evaluating Conditional Image Generation
arXiv 2025
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
arXiv 2025
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
arXiv 2025
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization
arXiv 2025
KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model
arXiv 2025
Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs
arXiv 2025
Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents
arXiv 2025
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
arXiv 2024
Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
arXiv 2024
SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection
arXiv 2024
Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment
arXiv 2024
In-Context Learning State Vector with Inner and Momentum Optimization
arXiv 2024
A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation
arXiv 2024
LMEye: An Interactive Perception Network for Large Language Models
arXiv 2023
A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues
arXiv 2023
Towards Reasoning in Large Language Models via Multi-Agent Peer Review Collaboration
arXiv 2023
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation
arXiv 2023
A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text
arXiv 2023
A Read-and-Select Framework for Zero-shot Entity Linking
arXiv 2023
A Survey of Large Language Models Attribution
arXiv 2023
Revisiting Sparse Retrieval for Few-shot Entity Linking
arXiv 2023
Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs
arXiv 2023
An Efficient Memory-Augmented Transformer for Knowledge-Intensive NLP Tasks
arXiv 2022
Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark
arXiv 2022
Affiliations
Frequent co-authors
10from 32 papers