Tulu 3: Pushing Frontiers in Open Language Model Post-Training

Allen AI's fully open post-training recipe (data, code, weights) combining SFT, DPO, and a novel Reinforcement Learning with Verifiable Rewards (RLVR) stage that matches Llama 3 Instruct.

Open

Preview
Publisher: Allen Institute for AI (Ai2)
Year: 2024
Venue: preprint
ArXiv: arxiv.org/abs/2411.15124
Code: github.com/allenai/open-instruct
Authors: 24
Hosting: External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2411.15124
TL;DR: semanticscholar.org/paper/6a7c29829227bfd65ae0ffec294a874bb9ea0871
Code: github.com/allenai/open-instruct

Attribution policy →

Introduces 4 artifacts - 2 tools, 2 models

TL;DR

Semantic Scholar

This work introduces Tulu 3, a family of fully-open state-of-the-art post-trained models, alongside its data, code, and training recipes, serving as a comprehensive guide for modern post-training techniques.

Artifacts

Tools

Tülu 3 SFT Mixture Open Instruct

Models

Tülu 3 70B Tülu 3 (family)