Colin Raffel

Enhancing Training Data Attribution with Representational Optimization

arXiv 2025

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

arXiv 2025

TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior

arXiv 2025

DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

arXiv 2024

A Survey on Data Selection for Language Models

arXiv 2024

Learning to Route Among Specialized Experts for Zero-Shot Generalization

arXiv 2024

Realistic Evaluation of Model Merging for Compositional Generalization

arXiv 2024

TIES-Merging: Resolving Interference When Merging Models

NeurIPS 2023 11

Scaling Data-Constrained Language Models

scaling-data-constrained-language-models

Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models

arXiv 2023

Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model

arXiv 2023

ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization

arXiv 2023

Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data

improving-few-shot-generalization-by

Merging by Matching Models in Task Parameter Subspaces

arXiv 2023

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

ACL 2022 5

Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$

arXiv 2022

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

arXiv 2022

Petals: Collaborative Inference and Fine-tuning of Large Models

arXiv 2022

What Language Model to Train if You Have One Million GPU Hours?

arXiv 2022

Crosslingual Generalization through Multitask Finetuning

arXiv 2022

Large Language Models Struggle to Learn Long-Tail Knowledge

arXiv 2022

Evaluating the Factual Consistency of Large Language Models Through News Summarization

arXiv 2022

What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?

arXiv 2022

Do Transformer Modifications Transfer Across Implementations and Applications?

EMNLP 2021 11

Merging Models with Fisher-Weighted Averaging

arXiv 2021

Training Neural Networks with Fixed Sparse Masks

NeurIPS 2021 12

On Training Sample Memorization: Lessons from Benchmarking Generative Modeling with a Large-scale Competition

arXiv 2021

ByT5: Towards a token-free future with pre-trained byte-to-byte models

arXiv 2021

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

EMNLP 2020 11

2020

mT5: A massively multilingual pre-trained text-to-text transformer

NAACL 2021 4

2020

Extracting Training Data from Large Language Models

arXiv 2020

2020

Affiliations

No known affiliations.

Frequent co-authors

from 33 papers

Adam Roberts

Derek Tam

Mohit Bansal

Brian Lester

Niklas Muennighoff

grad-student

5 shared papers

Haokun Liu

Nikhil Kandpal

Stella Biderman

founder

Teven Le Scao