Fei-Fei Li

Stanford CS professor, co-founder of World Labs, co-director of Stanford HAI; creator of ImageNet and one of the defining figures of modern computer vision.

Role: professor
Currently at: Stanford University
Twitter: twitter.com/drfeifei
GitHub: Unknown
Scholar: scholar.google.com/citations
Papers: 25

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: scholar.google.com/citations

Attribution policy →

25papers

Authored papers

25

ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop

arXiv 2026

RAGEN-2: Reasoning Collapse in Agentic RL

arXiv 2026

Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?

arXiv 2026

Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs

arXiv 2026

PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation

arXiv 2026

s1: Simple Test-Time Scaling

preprint

RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning

arXiv 2025

Re-thinking Temporal Search for Long-Form Video Understanding

CVPR 2025 1

BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities

arXiv 2025

Exploring Diffusion Transformer Designs via Grafting

arXiv 2025

ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction

arXiv 2025

QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models

arXiv 2025

Spatial Mental Modeling from Limited Views

arXiv 2025

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

CVPR 2025 1

Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making

arXiv 2024

HourVideo: 1-Hour Video-Language Understanding

arXiv 2024

TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction

arXiv 2024

Agent AI: Surveying the Horizons of Multimodal Interaction

arXiv 2024

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

arXiv 2023

ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image

CVPR 2024 1

Mini-BEHAVIOR: A Procedurally Generated Benchmark for Long-horizon Decision-Making in Embodied AI

arXiv 2023

VIMA: General Robot Manipulation with Multimodal Prompts

arXiv 2022

Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering

ACL 2021 5

Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks

arXiv 2018

Inferring and Executing Programs for Visual Reasoning

inferring-and-executing-programs-for-visual-1

Affiliations

Currently at

Stanford University

professor · university lab

Previously

Google (Alphabet Inc.)frontier lab World Labsstartup

Frequent co-authors

10

from 25 papers

Jiajun Wu

15 shared papers

Manling Li

11 shared papers

Ruohan Zhang

7 shared papers

Qineng Wang

6 shared papers

Yejin Choi

professor

6 shared papers

Keshigeyan Chandrasegaran

5 shared papers

Zihan Wang

5 shared papers

Kangrui Wang

4 shared papers

Pingyue Zhang

4 shared papers

Agrim Gupta

3 shared papers