0

Hannaneh Hajishirzi

Professor of CS at UW and Senior Director of AI at Allen Institute for AI; co-leads OLMo and Tülu - the fully open language-model program.

Role
professor
Papers
68

Cite

Notes

Only stored in your browser.

68papers·1tool contribs

Authored papers

68

Meta-Reinforcement Learning with Self-Reflection for Agentic Search

arXiv 2026

2026

Learning to Detect Language Model Training Data via Active Reconstruction

arXiv 2026

2026

s1: Simple Test-Time Scaling

preprint

2025

Olmo 3

arXiv 2025

2025

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

arXiv 2025

2025

Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index

arXiv 2025

2025

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

arXiv 2025

2025

RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments

arXiv 2025

2025

Spurious Rewards: Rethinking Training Signals in RLVR

arXiv 2025

2025

FlexOlmo: Open Language Models for Flexible Data Use

arXiv 2025

2025

EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees

arXiv 2025

2025

SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks

arXiv 2025

2025

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

preprint

2024

2 OLMo 2 Furious

arXiv 2024

2024

OLMo: Accelerating the Science of Language Models

arXiv 2024

2024

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

CVPR 2025 1

2024

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

arXiv 2024

2024

OLMoE: Open Mixture-of-Experts Language Models

arXiv 2024

2024

RewardBench: Evaluating Reward Models for Language Modeling

arXiv 2024

2024

Data Engineering for Scaling Language Models to 128K Context

arXiv 2024

2024

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

arXiv 2024

2024

SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature

arXiv 2024

2024

Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens

arXiv 2024

2024

Do Membership Inference Attacks Work on Large Language Models?

arXiv 2024

2024

Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

arXiv 2024

2024

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

arXiv 2024

2024

APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference

arXiv 2024

2024

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

arXiv 2024

2024

How Many Van Goghs Does It Take to Van Gogh? Finding the Imitation Threshold

arXiv 2024

2024

HREF: Human Response-Guided Evaluation of Instruction Following in Language Models

arXiv 2024

2024

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

arXiv 2023

2023

DataComp: In search of the next generation of multimodal datasets

NeurIPS 2023 11

2023

Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging

arXiv 2023

2023

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

arXiv 2023

2023

BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models

arXiv 2023

2023

Crystal: Introspective Reasoners Reinforced with Self-Feedback

arXiv 2023

2023

TaskWeb: Selecting Better Source Tasks for Multi-task NLP

arXiv 2023

2023

SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore

arXiv 2023

2023

Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements

arXiv 2023

2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

2022

Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

arXiv 2022

2022

Editing Models with Task Arithmetic

arXiv 2022

2022

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

arXiv 2022

2022

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

arXiv 2022

2022

NaturalProver: Grounded Mathematical Proof Generation with Language Models

arXiv 2022

2022

Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering

arXiv 2022

2022

Self-Instruct: Aligning Language Models with Self-Generated Instructions

arXiv 2022

2022

When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories

arXiv 2022

2022

Nonparametric Masked Language Modeling

arXiv 2022

2022

Task-aware Retrieval with Instructions

arXiv 2022

2022

CREPE: Open-Domain Question Answering with False Presuppositions

arXiv 2022

2022

MetaICL: Learning to Learn In Context

NAACL 2022 7

2021

Generated Knowledge Prompting for Commonsense Reasoning

ACL 2022 5

2021

MultiVerS: Improving scientific claim verification with weak supervision and full-document context

Findings (NAACL) 2022 7

2021

Probing Across Time: What Does RoBERTa Know and When?

Findings (EMNLP) 2021 11

2021

Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts

NAACL 2022 7

2021

Robust fine-tuning of zero-shot models

robust-fine-tuning-of-zero-shot-models-1

2021

Efficient Passage Retrieval with Hashing for Open-domain Question Answering

ACL 2021 5

2021

GooAQ: Open Question Answering with Diverse Answer Types

Findings (EMNLP) 2021 11

2021

NaturalProofs: Mathematical Theorem Proving in Natural Language

arXiv 2021

2021

Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics

EMNLP 2020 11

2020

MedICaT: A Dataset of Medical Images, Captions, and Textual References

Findings of the Association for Computational Linguistics 2020

2020

UnifiedQA: Crossing Format Boundaries With a Single QA System

Findings of the Association for Computational Linguistics 2020

2020

DeLighT: Deep and Light-weight Transformer

delight-deep-and-light-weight-transformer

2020

XOR QA: Cross-lingual Open-Retrieval Question Answering

NAACL 2021 4

2020

Contextualized Sparse Representations for Real-Time Open-Domain Question Answering

contextualized-sparse-representations-for

2019

Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index

real-time-open-domain-question-answering-with-1

2019

A Diagram Is Worth A Dozen Images

arXiv 2016

2016

Tool contributions

1

Affiliations

Currently at

University of Washington

professor · university lab

Frequent co-authors

10

from 68 papers