Colin Raffel
- Papers
- 33
Cite
Notes
Only stored in your browser.
Authored papers
33The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text
arXiv 2025
Enhancing Training Data Attribution with Representational Optimization
arXiv 2025
TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior
arXiv 2025
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
arXiv 2025
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
arXiv 2024
A Survey on Data Selection for Language Models
arXiv 2024
Learning to Route Among Specialized Experts for Zero-Shot Generalization
arXiv 2024
Realistic Evaluation of Model Merging for Compositional Generalization
arXiv 2024
TIES-Merging: Resolving Interference When Merging Models
NeurIPS 2023 11
Scaling Data-Constrained Language Models
scaling-data-constrained-language-models
Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models
arXiv 2023
Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data
improving-few-shot-generalization-by
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
arXiv 2023
Merging by Matching Models in Task Parameter Subspaces
arXiv 2023
ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization
arXiv 2023
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
ACL 2022 5
Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$
arXiv 2022
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
TMLR
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning
arXiv 2022
Petals: Collaborative Inference and Fine-tuning of Large Models
arXiv 2022
What Language Model to Train if You Have One Million GPU Hours?
arXiv 2022
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
arXiv 2022
Large Language Models Struggle to Learn Long-Tail Knowledge
arXiv 2022
Evaluating the Factual Consistency of Large Language Models Through News Summarization
arXiv 2022
Crosslingual Generalization through Multitask Finetuning
arXiv 2022
Do Transformer Modifications Transfer Across Implementations and Applications?
EMNLP 2021 11
Merging Models with Fisher-Weighted Averaging
arXiv 2021
Training Neural Networks with Fixed Sparse Masks
NeurIPS 2021 12
On Training Sample Memorization: Lessons from Benchmarking Generative Modeling with a Large-scale Competition
arXiv 2021
ByT5: Towards a token-free future with pre-trained byte-to-byte models
arXiv 2021
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
EMNLP 2020 11
mT5: A massively multilingual pre-trained text-to-text transformer
NAACL 2021 4
Extracting Training Data from Large Language Models
arXiv 2020
Affiliations
Frequent co-authors
10from 33 papers