0

Jitendra Malik

Papers
30

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
30papers

Authored papers

30

World Model for Robot Learning: A Comprehensive Survey

arXiv 2026

2026

SAM 3D Body: Robust Full-Body Human Mesh Recovery

arXiv 2026

2026

AutoEval Done Right: Using Synthetic Data for Model Evaluation

arXiv 2024

2026

SAM 3D: 3Dfy Anything in Images

arXiv 2025

2025

DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy

arXiv 2025

2025

SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending

arXiv 2025

2025

OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction

arXiv 2025

2025

AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

arXiv 2025

2025

Large Video Planner Enables Generalizable Robot Control

arXiv 2025

2025

FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos

arXiv 2025

2025

Digitizing Touch with an Artificial Multimodal Fingertip

arXiv 2024

2024

Wolf: Captioning Everything with a World Summarization Framework

arXiv 2024

2024

xT: Nested Tokenization for Larger Context in Large Images

arXiv 2024

2024

Reconstructing Hand-Held Objects in 3D from Images and Videos

arXiv 2024

2024

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

arXiv 2023

2023

EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding

egoschema-a-diagnostic-benchmark-for-very

2023

Multiview Compressive Coding for 3D Reconstruction

CVPR 2023 1

2023

Speculative Decoding with Big Little Decoder

speculative-decoding-with-big-little-decoder

2023

Sequential Modeling Enables Scalable Learning for Large Vision Models

CVPR 2024 1

2023

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

arXiv 2023

2023

Humans in 4D: Reconstructing and Tracking Humans with Transformers

ICCV 2023 1

2023

Interactive Task Planning with Language Models

arXiv 2023

2023

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity

CVPR 2022 1

2022

Learning to Learn with Generative Models of Neural Network Checkpoints

arXiv 2022

2022

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

arXiv 2022

2022

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

CVPR 2022 1

2021

Habitat 2.0: Training Home Assistants to Rearrange their Habitat

NeurIPS 2021 12

2021

Ego4D: Around the World in 3,000 Hours of Egocentric Video

CVPR 2022 1

2021

Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans

omnidata-a-scalable-pipeline-for-making-multi

2021

3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera

3d-scene-graph-a-structure-for-unified-1

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 30 papers