Bernt Schiele

Test-Time Visual In-Context Tuning

CVPR 2025 1

AnyUp: Universal Feature Upsampling

arXiv 2025

PersonaHOI: Effortlessly Improving Personalized Face with Human-Object Interaction Generation

arXiv 2025

LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers

arXiv 2025

Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery

arXiv 2024

GiT: Towards Generalist Vision Transformer through Universal Language Interface

arXiv 2024

Number it: Temporal Grounding Videos like Flipping Manga

CVPR 2025 1

B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable

arXiv 2024

Good Teachers Explain: Explanation-Enhanced Knowledge Distillation

arXiv 2024

Adaptive Hierarchical Certification for Segmentation using Randomized Smoothing

arXiv 2024

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

arXiv 2024

DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

CVPR 2023 1

HowToCaption: Prompting LLMs to Transform Video Annotations at Scale

arXiv 2023

UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation

ICCV 2023 1

Object-Centric Multiple Object Tracking

ICCV 2023 1

Studying How to Efficiently and Effectively Guide Models with Explanations

studying-how-to-efficiently-and-effectively

Learning by Sorting: Self-supervised Learning with Group Ordering Constraints

ICCV 2023 1

Better Understanding Differences in Attribution Methods via Systematic Evaluations

arXiv 2023

In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval

ICCV 2023 1

Unsupervised Open-Vocabulary Object Localization in Videos

ICCV 2023 1

Robustifying Token Attention for Vision Transformers

ICCV 2023 1