Martin Wattenberg

Papers: 14

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

14papers

Authored papers

Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls

arXiv 2025

2025

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

arXiv 2025

2025

Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner

arXiv 2024

2024

A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity

arXiv 2024

2024

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

arXiv 2024

2024

Measuring and Controlling Instruction (In)Stability in Language Model Dialogs

arXiv 2024

2024

Designing a Dashboard for Transparency and Control of Conversational AI

arXiv 2024

2024

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

NeurIPS 2023 11

2023

Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

arXiv 2023

2023

Emergent Linear Representations in World Models of Self-Supervised Sequence Models

arXiv 2023

2023

Toy Models of Superposition

arXiv 2022

2022

Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task

arXiv 2022

2022

GAN Lab: Understanding Complex Deep Generative Models using Interactive Visual Experimentation

arXiv 2018

2018

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

interpretability-beyond-feature-attribution-1

2017

Affiliations

No known affiliations.

Frequent co-authors

from 14 papers

Fernanda Viégas

Kenneth Li

Andrew Lee

Hanspeter Pfister

David Bau

Itamar Pres

Oam Patel

Xiaoyan Bai

Yida Chen

Aoyu Wu