Trevor Darrell
UC Berkeley professor; one of the most cited computer-vision researchers and a co-founder of the BAIR Lab and BDD100K dataset.
- Role
- professor
- Currently at
- University of California, Berkeley
- Scholar
- scholar.google.com/citations
- Papers
- 69
Cite
Notes
Only stored in your browser.
Authored papers
69Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing
arXiv 2026
Learning a Generative Meta-Model of LLM Activations
arXiv 2026
LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory
arXiv 2026
Describe Anything: Detailed Localized Image and Video Captioning
ICCV 2025
REOrdering Patches Improves Vision Models
arXiv 2025
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling
arXiv 2025
Learning Adaptive Parallel Reasoning with Language Models
arXiv 2025
Video Action Differencing
arXiv 2025
Search Arena: Analyzing Search-Augmented LLMs
arXiv 2025
Scaling Vision Pre-Training to 4K Resolution
CVPR 2025 1
TULIP: Towards Unified Language-Image Pretraining
arXiv 2025
AutoPresent: Designing Structured Visuals from Scratch
CVPR 2025 1
Pillar-0: A New Frontier for Radiology Foundation Models
arXiv 2025
Reconstruction Alignment Improves Unified Multimodal Models
arXiv 2025
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
arXiv 2025
FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos
arXiv 2025
UnSAMv2: Self-Supervised Learning Enables Segment Anything at Any Granularity
arXiv 2025
Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint
arXiv 2025
Constantly Improving Image Models Need Constantly Improving Benchmarks
arXiv 2025
Atlas: Multi-Scale Attention Improves Long Context Image Modeling
arXiv 2025
Visually Prompted Benchmarks Are Surprisingly Fragile
arXiv 2025
Navigation World Models
CVPR 2025 1
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion
arXiv 2024
VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models
arXiv 2024
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
CVPR 2024 1
Segment Anything without Supervision
arXiv 2024
When Do We Not Need Larger Vision Models?
arXiv 2024
Neural Network Diffusion
arXiv 2024
InstanceDiffusion: Instance-level Control for Image Generation
CVPR 2024 1
Rethinking Patch Dependence for Masked Autoencoders
arXiv 2024
xT: Nested Tokenization for Larger Context in Large Images
arXiv 2024
Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark
arXiv 2024
CLAIR-A: Leveraging Large Language Models to Judge Audio Captions
arXiv 2024
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
arXiv 2024
Wolf: Captioning Everything with a World Summarization Framework
arXiv 2024
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
arXiv 2024
Initializing Models with Larger Ones
arXiv 2023
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation
CVPR 2024 1
Compositional Chain-of-Thought Prompting for Large Multimodal Models
CVPR 2024 1
Stochastic positional embeddings improve masked image modeling
arXiv 2023
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
arXiv 2023
Hierarchical Open-vocabulary Universal Image Segmentation
hierarchical-open-vocabulary-universal-image
Dropout Reduces Underfitting
arXiv 2023
Unsupervised Universal Image Segmentation
CVPR 2024 1
Describing Differences in Image Sets with Natural Language
CVPR 2024 1
Guiding Pretraining in Reinforcement Learning with Large Language Models
arXiv 2023
Modular Visual Question Answering via Code Generation
arXiv 2023
PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor
arXiv 2023
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
arXiv 2023
Sequential Modeling Enables Scalable Learning for Large Vision Models
CVPR 2024 1
A ConvNet for the 2020s
CVPR 2022 1
Visual Prompting via Image Inpainting
arXiv 2022
Back to the Source: Diffusion-Driven Test-Time Adaptation
arXiv 2022
Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning
ICCV 2023 1
Contrastive Test-Time Adaptation
CVPR 2022 1
Multitask Vision-Language Prompt Tuning
arXiv 2022
Refine and Represent: Region-to-Object Representation Learning
arXiv 2022
Fighting Gradients with Gradients: Dynamic Defenses against Adversarial Attacks
NeurIPS 2021 12
Tent: Fully Test-time Adaptation by Entropy Minimization
tent-fully-test-time-adaptation-by-entropy
BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning
bdd100k-a-diverse-driving-dataset-for
Rethinking the Value of Network Pruning
rethinking-the-value-of-network-pruning-1
Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders
arXiv 2018
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence
multimodal-explanations-justifying-decisions-1
SkipNet: Learning Dynamic Routing in Convolutional Networks
skipnet-learning-dynamic-routing-in-1
Deep Layer Aggregation
deep-layer-aggregation-1
Context Encoders: Feature Learning by Inpainting
context-encoders-feature-learning-by-1
End-to-end Learning of Driving Models from Large-scale Video Datasets
end-to-end-learning-of-driving-models-from-1
Caffe: Convolutional Architecture for Fast Feature Embedding
arXiv 2014
DenseNet: Implementing Efficient ConvNet Descriptor Pyramids
arXiv 2014
Affiliations
Previously
Frequent co-authors
10from 69 papers