0

Yonatan Belinkov

Papers
21

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
21papers

Authored papers

21

SAEs Are Good for Steering -- If You Select the Right Features

arXiv 2025

2025

Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models

arXiv 2025

2025

Position-aware Automatic Circuit Discovery

arXiv 2025

2025

Inside-Out: Hidden Factual Knowledge in LLMs

arXiv 2025

2025

Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms

arXiv 2024

2024

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

arXiv 2024

2024

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

arXiv 2024

2024

Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics

arXiv 2024

2024

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

arXiv 2024

2024

Distinguishing Ignorance from Error in LLM Hallucinations

arXiv 2024

2024

Confidence Regulation Neurons in Language Models

arXiv 2024

2024

Semantics and Spatiality of Emergent Communication

arXiv 2024

2024

A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis

arXiv 2023

2023

Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias

arXiv 2023

2023

Unified Concept Editing in Diffusion Models

arXiv 2023

2023

Generating Benchmarks for Factuality Evaluation of Language Models

arXiv 2023

2023

VISIT: Visualizing and Interpreting the Semantic Information Flow of Transformers

arXiv 2023

2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

2022

Locating and Editing Factual Associations in GPT

arXiv 2022

2022

Mass-Editing Memory in a Transformer

arXiv 2022

2022

Improving Neural Language Models by Segmenting, Attending, and Predicting the Future

improving-neural-language-models-by-1

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 21 papers