Manling Li
- Papers
- 29
Cite
Notes
Only stored in your browser.
Authored papers
29ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop
arXiv 2026
RAGEN-2: Reasoning Collapse in Agentic RL
arXiv 2026
AutoResearch AI: Towards AI-Powered Research Automation for Scientific Discovery
arXiv 2026
Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?
arXiv 2026
Interactive Evaluation Requires a Design Science
arXiv 2026
Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs
arXiv 2026
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
arXiv 2025
Adaptation of Agentic AI
arXiv 2025
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
arXiv 2025
Re-thinking Temporal Search for Long-Form Video Understanding
CVPR 2025 1
Exploring Diffusion Transformer Designs via Grafting
arXiv 2025
Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging
arXiv 2025
LEMONADE: A Large Multilingual Expert-Annotated Abstractive Event Dataset for the Real World
arXiv 2025
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
arXiv 2025
Spatial Mental Modeling from Limited Views
arXiv 2025
Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models
arXiv 2025
CaptionQA: Is Your Caption as Useful as the Image Itself?
arXiv 2025
A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning
arXiv 2025
Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas
arXiv 2025
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
arXiv 2024
HourVideo: 1-Hour Video-Language Understanding
arXiv 2024
Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
arXiv 2024
IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos
arXiv 2024
Visually Descriptive Language Model for Vector Graphics Reasoning
arXiv 2024
MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders
arXiv 2024
Word Embeddings Are Steers for Language Models
arXiv 2023
HallE-Control: Controlling Object Hallucination in Large Multimodal Models
arXiv 2023
Non-Sequential Graph Script Induction via Multimedia Grounding
arXiv 2023
Multimedia Generative Script Learning for Task Planning
arXiv 2022
Affiliations
Frequent co-authors
10from 29 papers
Jiajun Wu
Fei-Fei Li
professor
Heng Ji
professor
Qineng Wang
Zihan Wang
Yejin Choi
professor
Kangrui Wang
Keshigeyan Chandrasegaran
Pingyue Zhang
Juan Carlos Niebles