0

Noah A. Smith

Papers
60

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
60papers

Authored papers

60

Meta-Reinforcement Learning with Self-Reflection for Agentic Search

arXiv 2026

2026

Olmo 3

arXiv 2025

2025

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

arXiv 2025

2025

Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index

arXiv 2025

2025

PointArena: Probing Multimodal Grounding Through Language-Guided Pointing

arXiv 2025

2025

Bolmo: Byteifying the Next Generation of Language Models

arXiv 2025

2025

FlexOlmo: Open Language Models for Flexible Data Use

arXiv 2025

2025

MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation

arXiv 2025

2025

BLAB: Brutally Long Audio Bench

arXiv 2025

2025

2 OLMo 2 Furious

arXiv 2024

2024

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

preprint

2024

OLMo: Accelerating the Science of Language Models

arXiv 2024

2024

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

CVPR 2025 1

2024

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

arXiv 2024

2024

OLMoE: Open Mixture-of-Experts Language Models

arXiv 2024

2024

RewardBench: Evaluating Reward Models for Language Modeling

arXiv 2024

2024

Tuning Language Models by Proxy

arXiv 2024

2024

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

arXiv 2024

2024

Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?

arXiv 2024

2024

Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models

arXiv 2024

2024

What's In My Big Data?

arXiv 2023

2023

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

ICCV 2023 1

2023

Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements

arXiv 2023

2023

Scaling Expert Language Models with Unsupervised Domain Discovery

arXiv 2023

2023

SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore

arXiv 2023

2023

Time is Encoded in the Weights of Finetuned Language Models

arXiv 2023

2023

We're Afraid Language Models Aren't Modeling Ambiguity

arXiv 2023

2023

In-context Pretraining: Language Modeling Beyond Document Boundaries

arXiv 2023

2023

How Language Model Hallucinations Can Snowball

arXiv 2023

2023

Summarization-Based Document IDs for Generative Retrieval with Language Models

arXiv 2023

2023

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

arXiv 2022

2022

Self-Instruct: Aligning Language Models with Self-Generated Instructions

arXiv 2022

2022

One Embedder, Any Task: Instruction-Finetuned Text Embeddings

arXiv 2022

2022

UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models

arXiv 2022

2022

Measuring and Narrowing the Compositionality Gap in Language Models

arXiv 2022

2022

Modeling Context With Linear Attention for Scalable Document-Level Translation

arXiv 2022

2022

PromptCap: Prompt-Guided Task-Aware Image Captioning

arXiv 2022

2022

Selective Annotation Makes Language Models Better Few-Shot Learners

arXiv 2022

2022

WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation

arXiv 2022

2022

RealTime QA: What's the Answer Right Now?

realtime-qa-what-s-the-answer-right-now

2022

Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

arXiv 2022

2022

A Call for Clarity in Beam Search: How It Works and When It Stops

arXiv 2022

2022

In-Context Learning for Few-Shot Dialogue State Tracking

arXiv 2022

2022

Transparency Helps Reveal When Language Models Learn Meaning

arXiv 2022

2022

Binding Language Models in Symbolic Languages

arXiv 2022

2022

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

train-short-test-long-attention-with-linear-1

2021

DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts

ACL 2021 5

2021

A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers

NAACL 2021 4

2021

DEMix Layers: Disentangling Domains for Modular Language Modeling

NAACL 2022 7

2021

NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics

NAACL 2022 7

2021

Probing Across Time: What Does RoBERTa Know and When?

Findings (EMNLP) 2021 11

2021

Challenges in Automated Debiasing for Toxic Language Detection

EACL 2021 2

2021

Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics

EMNLP 2020 11

2020

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

don-t-stop-pretraining-adapt-language-models-1

2020

Shortformer: Better Language Modeling using Shorter Inputs

ACL 2021 5

2020

Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation

deep-encoder-shallow-decoder-reevaluating-non

2020

Knowledge Enhanced Contextual Word Representations

knowledge-enhanced-contextual-word-1

2019

Dynamic Entity Representations in Neural Language Models

dynamic-entity-representations-in-neural-1

2017

Transition-Based Dependency Parsing with Stack Long Short-Term Memory

transition-based-dependency-parsing-with-5

2015

Retrofitting Word Vectors to Semantic Lexicons

retrofitting-word-vectors-to-semantic-1

2014

Affiliations

No known affiliations.

Frequent co-authors

10

from 60 papers