Hannaneh Hajishirzi

Professor of CS at UW and Senior Director of AI at Allen Institute for AI; co-leads OLMo and Tülu - the fully open language-model program.

Role: professor
Currently at: University of Washington
Twitter: twitter.com/HannaHajishirzi
GitHub: github.com/hannaneh
Scholar: scholar.google.com/citations
Papers: 68

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: scholar.google.com/citations

Attribution policy →

68papers·1tool contribs

Authored papers

68

Meta-Reinforcement Learning with Self-Reflection for Agentic Search

arXiv 2026

Learning to Detect Language Model Training Data via Active Reconstruction

arXiv 2026

s1: Simple Test-Time Scaling

preprint

Olmo 3

arXiv 2025

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

arXiv 2025

Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index

arXiv 2025

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

arXiv 2025

RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments

arXiv 2025

Spurious Rewards: Rethinking Training Signals in RLVR

arXiv 2025

SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks

arXiv 2025

FlexOlmo: Open Language Models for Flexible Data Use

arXiv 2025

EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees

arXiv 2025

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

preprint

2 OLMo 2 Furious

arXiv 2024

OLMo: Accelerating the Science of Language Models

arXiv 2024

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

CVPR 2025 1

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

arXiv 2024

OLMoE: Open Mixture-of-Experts Language Models

arXiv 2024

RewardBench: Evaluating Reward Models for Language Modeling

arXiv 2024

Data Engineering for Scaling Language Models to 128K Context

arXiv 2024

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

arXiv 2024

SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature

arXiv 2024

Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens

arXiv 2024

Do Membership Inference Attacks Work on Large Language Models?

arXiv 2024

Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

arXiv 2024

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

arXiv 2024

APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference

arXiv 2024

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

arXiv 2024

How Many Van Goghs Does It Take to Van Gogh? Finding the Imitation Threshold

arXiv 2024

HREF: Human Response-Guided Evaluation of Instruction Following in Language Models

arXiv 2024

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

arXiv 2023

DataComp: In search of the next generation of multimodal datasets

NeurIPS 2023 11

Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging

arXiv 2023

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

arXiv 2023

SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore

arXiv 2023

Crystal: Introspective Reasoners Reinforced with Self-Feedback

arXiv 2023

TaskWeb: Selecting Better Source Tasks for Multi-task NLP

arXiv 2023

BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models

arXiv 2023

Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements

arXiv 2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

arXiv 2022

Editing Models with Task Arithmetic

arXiv 2022

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

arXiv 2022

Self-Instruct: Aligning Language Models with Self-Generated Instructions

arXiv 2022

When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories

arXiv 2022

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

arXiv 2022

Nonparametric Masked Language Modeling

arXiv 2022

Task-aware Retrieval with Instructions

arXiv 2022

NaturalProver: Grounded Mathematical Proof Generation with Language Models

arXiv 2022

CREPE: Open-Domain Question Answering with False Presuppositions

arXiv 2022

Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering

arXiv 2022

MetaICL: Learning to Learn In Context

NAACL 2022 7

Robust fine-tuning of zero-shot models

robust-fine-tuning-of-zero-shot-models-1

Efficient Passage Retrieval with Hashing for Open-domain Question Answering

ACL 2021 5

GooAQ: Open Question Answering with Diverse Answer Types

Findings (EMNLP) 2021 11

NaturalProofs: Mathematical Theorem Proving in Natural Language

arXiv 2021

Generated Knowledge Prompting for Commonsense Reasoning

ACL 2022 5

MultiVerS: Improving scientific claim verification with weak supervision and full-document context

Findings (NAACL) 2022 7

Probing Across Time: What Does RoBERTa Know and When?

Findings (EMNLP) 2021 11

Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts

NAACL 2022 7

Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics

EMNLP 2020 11

MedICaT: A Dataset of Medical Images, Captions, and Textual References

Findings of the Association for Computational Linguistics 2020

UnifiedQA: Crossing Format Boundaries With a Single QA System

Findings of the Association for Computational Linguistics 2020

DeLighT: Deep and Light-weight Transformer

delight-deep-and-light-weight-transformer

XOR QA: Cross-lingual Open-Retrieval Question Answering

NAACL 2021 4

Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index

real-time-open-domain-question-answering-with-1

Contextualized Sparse Representations for Real-Time Open-Domain Question Answering

contextualized-sparse-representations-for

A Diagram Is Worth A Dozen Images

arXiv 2016

Tool contributions

1

Tülu 3 SFT Mixture

Allen Institute for AI (Ai2)

Allen AI's flagship open SFT mixture combining new persona-driven prompts with curated public data for post-training a frontier-quality instruct model.

SFT DatasetInstruction FollowingMathCode Generation

Affiliations

Currently at

University of Washington

professor · university lab

Previously

Allen Institute for AI (Ai2)non profit

Frequent co-authors

10

from 68 papers

Luke Zettlemoyer

professor

19 shared papers

Noah A. Smith

19 shared papers

Sewon Min

17 shared papers

Yejin Choi

professor

17 shared papers

Luca Soldaini

12 shared papers

Pang Wei Koh

12 shared papers

Yizhong Wang

researcher

12 shared papers

Ali Farhadi

CEO

11 shared papers

Jiacheng Liu

11 shared papers

Kyle Lo

10 shared papers