Jacob Andreas

Papers: 22

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

22papers

Authored papers

Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models

arXiv 2026

2026

Self-Steering Language Models

arXiv 2025

2025

Training Language Models to Explain Their Own Computations

arXiv 2025

2025

Elements of World Knowledge (EWOK): A cognition-inspired framework for evaluating basic world knowledge in language models

arXiv 2024

2024

The Surprising Effectiveness of Test-Time Training for Few-Shot Learning

arXiv 2024

2024

A Multimodal Automated Interpretability Agent

arXiv 2024

2024

In-Context Language Learning: Architectures and Algorithms

arXiv 2024

2024

Language Modeling with Editable External Knowledge

arXiv 2024

2024

Grokking of Hierarchical Structure in Vanilla Transformers

arXiv 2023

2023

Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?

arXiv 2023

2023

Interpreting User Requests in the Context of Natural Language Standing Instructions

arXiv 2023

2023

LILO: Learning Interpretable Libraries by Compressing and Documenting Code

arXiv 2023

2023

From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought

arXiv 2023

2023

Inspecting and Editing Knowledge Representations in Language Models

arXiv 2023

2023

Eliciting Human Preferences with Language Models

arXiv 2023

2023

Decision-Oriented Dialogue for Human-AI Collaboration

arXiv 2023

2023

Guiding Pretraining in Reinforcement Learning with Large Language Models

arXiv 2023

2023

Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks

arXiv 2023

2023

FIND: A Function Description Benchmark for Evaluating Interpretability Methods

find-a-function-description-benchmark-for

2023

PromptBoosting: Black-Box Text Classification with Ten Forward Passes

arXiv 2022

2022

Towards Tracing Factual Knowledge in Language Models Back to the Training Data

arXiv 2022

2022

Toward a Visual Concept Vocabulary for GAN Latent Space

toward-a-visual-concept-vocabulary-for-gan

2021

Affiliations

No known affiliations.

Frequent co-authors

from 22 papers

Ekin Akyürek

Belinda Z. Li

Gabriel Grand

Yoon Kim

Antonio Torralba

professor

Evan Hernandez

Joshua B. Tenenbaum

Sarah Schwettmann

Alexander K. Lew

Alexis Ross