Luke Zettlemoyer
UW NLP professor and research director at Meta FAIR; one of the most cited NLP researchers (ELMo, BART, OPT, Llama line).
- Role
- professor
- Currently at
- University of Washington
- twitter.com/LukeZettlemoyer
- GitHub
- Unknown
- Scholar
- scholar.google.com/citations
- Papers
- 81
Cite
Notes
Only stored in your browser.
Authored papers
81Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
arXiv 2026
Gecko: An Efficient Neural Architecture Inherently Processing Sequences with Arbitrary Lengths
arXiv 2026
Micro Language Models Enable Instant Responses
arXiv 2026
Anchored Decoding: Provably Reducing Copyright Risk for Any Language Model
arXiv 2026
s1: Simple Test-Time Scaling
preprint
DreamGen: Unlocking Generalization in Robot Learning through Video World Models
arXiv 2025
Olmo 3
arXiv 2025
ReasonIR: Training Retrievers for Reasoning Tasks
arXiv 2025
(Mis)Fitting: A Survey of Scaling Laws
arXiv 2025
Reconstruction Alignment Improves Unified Multimodal Models
arXiv 2025
Bolmo: Byteifying the Next Generation of Language Models
arXiv 2025
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
arXiv 2025
Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image
arXiv 2025
Meta CLIP 2: A Worldwide Scaling Recipe
arXiv 2025
Spurious Rewards: Rethinking Training Signals in RLVR
arXiv 2025
FlexOlmo: Open Language Models for Flexible Data Use
arXiv 2025
OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Modalities
arXiv 2025
Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language Models
arXiv 2025
Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity
arXiv 2025
2 OLMo 2 Furious
arXiv 2024
OLMo: Accelerating the Science of Language Models
arXiv 2024
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
arXiv 2024
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
arXiv 2024
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs
arXiv 2024
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore
arXiv 2024
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
arXiv 2024
Do Membership Inference Attacks Work on Large Language Models?
arXiv 2024
Memory Layers at Scale
arXiv 2024
Byte Latent Transformer: Patches Scale Better Than Tokens
arXiv 2024
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
arXiv 2024
Negative Token Merging: Image-based Adversarial Feature Guidance
arXiv 2024
Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models
arXiv 2024
Stable and low-precision training for large-scale vision-language models
NeurIPS 2023 11
QLoRA: Efficient Finetuning of Quantized LLMs
NeurIPS 2023 11
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging
arXiv 2023
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
arXiv 2023
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
arXiv 2023
Shepherd: A Critic for Language Model Generation
arXiv 2023
Representation Deficiency in Masked Language Modeling
arXiv 2023
Scaling Expert Language Models with Unsupervised Domain Discovery
arXiv 2023
CiT: Curation in Training for Effective Vision-Language Data
ICCV 2023 1
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
arXiv 2023
In-context Pretraining: Language Modeling Beyond Document Boundaries
arXiv 2023
DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation
arXiv 2022
UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models
arXiv 2022
InCoder: A Generative Model for Code Infilling and Synthesis
arXiv 2022
Mega: Moving Average Equipped Gated Attention
arXiv 2022
Contrastive Decoding: Open-ended Text Generation as Optimization
arXiv 2022
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
arXiv 2022
Nonparametric Masked Language Modeling
arXiv 2022
CREPE: Open-Domain Question Answering with False Presuppositions
arXiv 2022
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
arXiv 2022
OPT: Open Pre-trained Transformer Language Models
arXiv 2022
One Embedder, Any Task: Instruction-Finetuned Text Embeddings
arXiv 2022
PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models
arXiv 2022
Selective Annotation Makes Language Models Better Few-Shot Learners
arXiv 2022
Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters
arXiv 2022
Improving Passage Retrieval with Zero-Shot Question Generation
arXiv 2022
Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models
arXiv 2022
Questions Are All You Need to Train a Dense Passage Retriever
arXiv 2022
M2D2: A Massively Multi-domain Language Modeling Dataset
arXiv 2022
Binding Language Models in Symbolic Languages
arXiv 2022
ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning
arXiv 2022
Few-shot Learning with Multilingual Language Models
arXiv 2021
8-bit Optimizers via Block-wise Quantization
8-bit-optimizers-via-block-wise-quantization-1
MetaICL: Learning to Learn In Context
NAACL 2022 7
SILG: The Multi-environment Symbolic Interactive Language Grounding Benchmark
arXiv 2021
Multilingual Autoregressive Entity Linking
arXiv 2021
DEMix Layers: Disentangling Domains for Modular Language Modeling
NAACL 2022 7
Multilingual Denoising Pre-training for Neural Machine Translation
arXiv 2020
DeLighT: Deep and Light-weight Transformer
delight-deep-and-light-weight-transformer
Detecting Hallucinated Content in Conditional Neural Sequence Generation
detecting-hallucinated-content-in-conditional
Unsupervised Cross-lingual Representation Learning at Scale
unsupervised-cross-lingual-representation-2
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
alfred-a-benchmark-for-interpreting-grounded-1
Sparse Networks from Scratch: Faster Training without Losing Performance
sparse-networks-from-scratch-faster-training-1
JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code Generation
juice-a-large-scale-distantly-supervised-1
Scalable Zero-shot Entity Linking with Dense Entity Retrieval
EMNLP 2020 11
SpanBERT: Improving Pre-training by Representing and Predicting Spans
spanbert-improving-pre-training-by-1
NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System
nl2bash-a-corpus-and-semantic-parser-for-2
Large-Scale QA-SRL Parsing
large-scale-qa-srl-parsing-1
Mapping Language to Code in Programmatic Context
mapping-language-to-code-in-programmatic-1
Affiliations
Frequent co-authors
10from 81 papers