0

Colin Raffel

Papers
33

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
33papers

Authored papers

33

The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

arXiv 2025

2025

Enhancing Training Data Attribution with Representational Optimization

arXiv 2025

2025

TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior

arXiv 2025

2025

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

arXiv 2025

2025

DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

arXiv 2024

2024

A Survey on Data Selection for Language Models

arXiv 2024

2024

Learning to Route Among Specialized Experts for Zero-Shot Generalization

arXiv 2024

2024

Realistic Evaluation of Model Merging for Compositional Generalization

arXiv 2024

2024

TIES-Merging: Resolving Interference When Merging Models

NeurIPS 2023 11

2023

Scaling Data-Constrained Language Models

scaling-data-constrained-language-models

2023

Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models

arXiv 2023

2023

Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data

improving-few-shot-generalization-by

2023

Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model

arXiv 2023

2023

Merging by Matching Models in Task Parameter Subspaces

arXiv 2023

2023

ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization

arXiv 2023

2023

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

ACL 2022 5

2022

Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$

arXiv 2022

2022

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

2022

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

arXiv 2022

2022

Petals: Collaborative Inference and Fine-tuning of Large Models

arXiv 2022

2022

What Language Model to Train if You Have One Million GPU Hours?

arXiv 2022

2022

What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?

arXiv 2022

2022

Large Language Models Struggle to Learn Long-Tail Knowledge

arXiv 2022

2022

Evaluating the Factual Consistency of Large Language Models Through News Summarization

arXiv 2022

2022

Crosslingual Generalization through Multitask Finetuning

arXiv 2022

2022

Do Transformer Modifications Transfer Across Implementations and Applications?

EMNLP 2021 11

2021

Merging Models with Fisher-Weighted Averaging

arXiv 2021

2021

Training Neural Networks with Fixed Sparse Masks

NeurIPS 2021 12

2021

On Training Sample Memorization: Lessons from Benchmarking Generative Modeling with a Large-scale Competition

arXiv 2021

2021

ByT5: Towards a token-free future with pre-trained byte-to-byte models

arXiv 2021

2021

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

EMNLP 2020 11

2020

mT5: A massively multilingual pre-trained text-to-text transformer

NAACL 2021 4

2020

Extracting Training Data from Large Language Models

arXiv 2020

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 33 papers