Shengyi "Costa" Huang

RLHF researcher; previously at Hugging Face and AI2; now at Periodic Labs. Maintainer of CleanRL and contributor to the Alignment Handbook.

Role: researcher
Currently at: Independent
GitHub: github.com/vwxyzjn
Scholar: scholar.google.com/citations
Papers: 10

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: scholar.google.com/citations

Attribution policy →

10papers

Authored papers

NuminaMath: The Largest Public Dataset in AI4Maths with 860k Pairs of Competition Math Problems and Solutions

blog

2024

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

preprint

2024

2 OLMo 2 Furious

arXiv 2024

2024

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

arXiv 2024

2024

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

arXiv 2024

2024

Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform

arXiv 2023

2023

A2C is a special case of PPO

arXiv 2022

2022

EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine

arXiv 2022

2022

CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms

arXiv 2021

2021

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

arXiv 2020

2020

Affiliations

Currently at

Independent

researcher · community

Previously

Allen Institute for AI (Ai2)non profit Hugging Faceinfra

Frequent co-authors

from 10 papers

Santiago Ontanon

3 shared papers

Arian Hosseini

2 shared papers

Faeze Brahman

researcher

2 shared papers

Hamish Ivison

grad-student

2 shared papers

Hannaneh Hajishirzi

professor

2 shared papers

Jacob Morrison

research-engineer

2 shared papers

Jiayi Weng

2 shared papers

Kashif Rasul

researcher

2 shared papers

Lester James V. Miranda

2 shared papers

Lewis Tunstall

engineer

2 shared papers