0

Trevor Darrell

UC Berkeley professor; one of the most cited computer-vision researchers and a co-founder of the BAIR Lab and BDD100K dataset.

Role
professor
Papers
69

Cite

Notes

Only stored in your browser.

69papers

Authored papers

69

Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing

arXiv 2026

2026

Learning a Generative Meta-Model of LLM Activations

arXiv 2026

2026

LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory

arXiv 2026

2026

Describe Anything: Detailed Localized Image and Video Captioning

ICCV 2025

2025

REOrdering Patches Improves Vision Models

arXiv 2025

2025

Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

arXiv 2025

2025

Learning Adaptive Parallel Reasoning with Language Models

arXiv 2025

2025

Video Action Differencing

arXiv 2025

2025

Search Arena: Analyzing Search-Augmented LLMs

arXiv 2025

2025

Scaling Vision Pre-Training to 4K Resolution

CVPR 2025 1

2025

TULIP: Towards Unified Language-Image Pretraining

arXiv 2025

2025

AutoPresent: Designing Structured Visuals from Scratch

CVPR 2025 1

2025

Pillar-0: A New Frontier for Radiology Foundation Models

arXiv 2025

2025

Reconstruction Alignment Improves Unified Multimodal Models

arXiv 2025

2025

Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

arXiv 2025

2025

FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos

arXiv 2025

2025

UnSAMv2: Self-Supervised Learning Enables Segment Anything at Any Granularity

arXiv 2025

2025

Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint

arXiv 2025

2025

Constantly Improving Image Models Need Constantly Improving Benchmarks

arXiv 2025

2025

Atlas: Multi-Scale Attention Improves Long Context Image Modeling

arXiv 2025

2025

Visually Prompted Benchmarks Are Surprisingly Fragile

arXiv 2025

2025

Navigation World Models

CVPR 2025 1

2024

MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion

arXiv 2024

2024

VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models

arXiv 2024

2024

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

CVPR 2024 1

2024

Segment Anything without Supervision

arXiv 2024

2024

When Do We Not Need Larger Vision Models?

arXiv 2024

2024

Neural Network Diffusion

arXiv 2024

2024

InstanceDiffusion: Instance-level Control for Image Generation

CVPR 2024 1

2024

Rethinking Patch Dependence for Masked Autoencoders

arXiv 2024

2024

xT: Nested Tokenization for Larger Context in Large Images

arXiv 2024

2024

Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark

arXiv 2024

2024

CLAIR-A: Leveraging Large Language Models to Judge Audio Captions

arXiv 2024

2024

Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning

arXiv 2024

2024

Wolf: Captioning Everything with a World Summarization Framework

arXiv 2024

2024

ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs

arXiv 2024

2024

Initializing Models with Larger Ones

arXiv 2023

2023

VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation

CVPR 2024 1

2023

Compositional Chain-of-Thought Prompting for Large Multimodal Models

CVPR 2024 1

2023

Stochastic positional embeddings improve masked image modeling

arXiv 2023

2023

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

arXiv 2023

2023

Hierarchical Open-vocabulary Universal Image Segmentation

hierarchical-open-vocabulary-universal-image

2023

Dropout Reduces Underfitting

arXiv 2023

2023

Unsupervised Universal Image Segmentation

CVPR 2024 1

2023

Describing Differences in Image Sets with Natural Language

CVPR 2024 1

2023

Guiding Pretraining in Reinforcement Learning with Large Language Models

arXiv 2023

2023

Modular Visual Question Answering via Code Generation

arXiv 2023

2023

PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor

arXiv 2023

2023

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models

arXiv 2023

2023

Sequential Modeling Enables Scalable Learning for Large Vision Models

CVPR 2024 1

2023

A ConvNet for the 2020s

CVPR 2022 1

2022

Visual Prompting via Image Inpainting

arXiv 2022

2022

Back to the Source: Diffusion-Driven Test-Time Adaptation

arXiv 2022

2022

Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning

ICCV 2023 1

2022

Contrastive Test-Time Adaptation

CVPR 2022 1

2022

Multitask Vision-Language Prompt Tuning

arXiv 2022

2022

Refine and Represent: Region-to-Object Representation Learning

arXiv 2022

2022

Fighting Gradients with Gradients: Dynamic Defenses against Adversarial Attacks

NeurIPS 2021 12

2021

Tent: Fully Test-time Adaptation by Entropy Minimization

tent-fully-test-time-adaptation-by-entropy

2020

BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning

bdd100k-a-diverse-driving-dataset-for

2018

Rethinking the Value of Network Pruning

rethinking-the-value-of-network-pruning-1

2018

Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders

arXiv 2018

2018

Multimodal Explanations: Justifying Decisions and Pointing to the Evidence

multimodal-explanations-justifying-decisions-1

2018

SkipNet: Learning Dynamic Routing in Convolutional Networks

skipnet-learning-dynamic-routing-in-1

2017

Deep Layer Aggregation

deep-layer-aggregation-1

2017

Context Encoders: Feature Learning by Inpainting

context-encoders-feature-learning-by-1

2016

End-to-end Learning of Driving Models from Large-scale Video Datasets

end-to-end-learning-of-driving-models-from-1

2016

Caffe: Convolutional Architecture for Fast Feature Embedding

arXiv 2014

2014

DenseNet: Implementing Efficient ConvNet Descriptor Pyramids

arXiv 2014

2014

Affiliations

Currently at

University of California, Berkeley

professor · university lab

Frequent co-authors

10

from 69 papers