Jitendra Malik
- Papers
- 30
Cite
Notes
Only stored in your browser.
Authored papers
30World Model for Robot Learning: A Comprehensive Survey
arXiv 2026
SAM 3D Body: Robust Full-Body Human Mesh Recovery
arXiv 2026
AutoEval Done Right: Using Synthetic Data for Model Evaluation
arXiv 2024
SAM 3D: 3Dfy Anything in Images
arXiv 2025
DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy
arXiv 2025
SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending
arXiv 2025
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction
arXiv 2025
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
arXiv 2025
Large Video Planner Enables Generalizable Robot Control
arXiv 2025
FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos
arXiv 2025
Digitizing Touch with an Artificial Multimodal Fingertip
arXiv 2024
Wolf: Captioning Everything with a World Summarization Framework
arXiv 2024
xT: Nested Tokenization for Larger Context in Large Images
arXiv 2024
Reconstructing Hand-Held Objects in 3D from Images and Videos
arXiv 2024
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
arXiv 2023
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding
egoschema-a-diagnostic-benchmark-for-very
Multiview Compressive Coding for 3D Reconstruction
CVPR 2023 1
Speculative Decoding with Big Little Decoder
speculative-decoding-with-big-little-decoder
Sequential Modeling Enables Scalable Learning for Large Vision Models
CVPR 2024 1
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
arXiv 2023
Humans in 4D: Reconstructing and Tracking Humans with Transformers
ICCV 2023 1
Interactive Task Planning with Language Models
arXiv 2023
Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity
CVPR 2022 1
Learning to Learn with Generative Models of Neural Network Checkpoints
arXiv 2022
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
arXiv 2022
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
CVPR 2022 1
Habitat 2.0: Training Home Assistants to Rearrange their Habitat
NeurIPS 2021 12
Ego4D: Around the World in 3,000 Hours of Egocentric Video
CVPR 2022 1
Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans
omnidata-a-scalable-pipeline-for-making-multi
3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera
3d-scene-graph-a-structure-for-unified-1
Affiliations
Frequent co-authors
10from 30 papers
Karttikeya Mangalam
Pieter Abbeel
professor
Haoran Geng
Trevor Darrell
professor
Christoph Feichtenhofer
Boyi Li
Georgia Gkioxari
Ilija Radosavovic
Marco Pavone
Matt Feiszli