Shengyi "Costa" Huang
RLHF researcher; previously at Hugging Face and AI2; now at Periodic Labs. Maintainer of CleanRL and contributor to the Alignment Handbook.
- Role
- researcher
- Currently at
- Independent
- GitHub
- github.com/vwxyzjn
- Scholar
- scholar.google.com/citations
- Papers
- 10
Cite
Notes
Only stored in your browser.
Authored papers
10NuminaMath: The Largest Public Dataset in AI4Maths with 860k Pairs of Competition Math Problems and Solutions
blog
Tulu 3: Pushing Frontiers in Open Language Model Post-Training
preprint
2 OLMo 2 Furious
arXiv 2024
The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
arXiv 2024
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
arXiv 2024
Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform
arXiv 2023
EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine
arXiv 2022
A2C is a special case of PPO
arXiv 2022
CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms
arXiv 2021
A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
arXiv 2020
Affiliations
Frequent co-authors
10from 10 papers
Santiago Ontanon
Arian Hosseini
Faeze Brahman
researcher
Hamish Ivison
grad-student
Hannaneh Hajishirzi
professor
Jacob Morrison
research-engineer
Jiayi Weng
Kashif Rasul
researcher
Lester James V. Miranda
Lewis Tunstall
engineer