Fei-Fei Li
Stanford CS professor, co-founder of World Labs, co-director of Stanford HAI; creator of ImageNet and one of the defining figures of modern computer vision.
- Role
- professor
- Currently at
- Stanford University
- twitter.com/drfeifei
- GitHub
- Unknown
- Scholar
- scholar.google.com/citations
- Papers
- 25
Cite
Notes
Only stored in your browser.
Authored papers
25ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop
arXiv 2026
RAGEN-2: Reasoning Collapse in Agentic RL
arXiv 2026
Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?
arXiv 2026
Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs
arXiv 2026
PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation
arXiv 2026
s1: Simple Test-Time Scaling
preprint
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
arXiv 2025
Re-thinking Temporal Search for Long-Form Video Understanding
CVPR 2025 1
BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities
arXiv 2025
Exploring Diffusion Transformer Designs via Grafting
arXiv 2025
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
arXiv 2025
Spatial Mental Modeling from Limited Views
arXiv 2025
QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models
arXiv 2025
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
CVPR 2025 1
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
arXiv 2024
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction
arXiv 2024
HourVideo: 1-Hour Video-Language Understanding
arXiv 2024
Agent AI: Surveying the Horizons of Multimodal Interaction
arXiv 2024
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
arXiv 2023
ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image
CVPR 2024 1
Mini-BEHAVIOR: A Procedurally Generated Benchmark for Long-horizon Decision-Making in Embodied AI
arXiv 2023
VIMA: General Robot Manipulation with Multimodal Prompts
arXiv 2022
Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering
ACL 2021 5
Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks
arXiv 2018
Inferring and Executing Programs for Visual Reasoning
inferring-and-executing-programs-for-visual-1
Affiliations
Frequent co-authors
10from 25 papers