Mor Geva
- Papers
- 33
Cite
Notes
Only stored in your browser.
Authored papers
33Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth
arXiv 2026
From Directions to Regions: Decomposing Activations in Language Models via Local Geometry
arXiv 2026
Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context
arXiv 2025
Rethinking Selective Knowledge Distillation
arXiv 2026
Friends and Grandmothers in Silico: Localizing Entity Cells in Language Models
arXiv 2026
Precise In-Parameter Concept Erasure in Large Language Models
arXiv 2025
Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization
arXiv 2025
Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas
arXiv 2025
Enhancing Automated Interpretability with Output-Centric Feature Descriptions
arXiv 2025
Universal Jailbreak Suffixes Are Strong Attention Hijackers
arXiv 2025
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations
arXiv 2025
Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
arXiv 2024
Towards Interpreting Visual Information Processing in Vision-Language Models
arXiv 2024
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
arXiv 2024
Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries
arXiv 2024
Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces
arXiv 2024
Inferring Functionality of Attention Heads from their Parameters
arXiv 2024
From Insights to Actions: The Impact of Interpretability and Analysis Research on NLP
arXiv 2024
From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty
arXiv 2024
Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
arXiv 2024
The Hidden Space of Transformer Language Adapters
arXiv 2024
In-Context Learning Creates Task Vectors
arXiv 2023
The Hidden Language of Diffusion Models
arXiv 2023
Evaluating the Ripple Effects of Knowledge Editing in Language Models
arXiv 2023
Jump to Conclusions: Short-Cutting Transformers With Linear Transformations
arXiv 2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
TMLR
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
arXiv 2022
SCROLLS: Standardized CompaRison Over Long Language Sequences
arXiv 2022
Analyzing Transformers in Embedding Space
arXiv 2022
Inferring Implicit Relations in Complex Questions with Language Models
arXiv 2022
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
arXiv 2021
Injecting Numerical Reasoning Skills into Language Models
injecting-numerical-reasoning-skills-into-1
Transformer Feed-Forward Layers Are Key-Value Memories
EMNLP 2021 11
Affiliations
Frequent co-authors
10from 33 papers