Recent advances in Large Language Models (LLMs) have shown that their reasoning capabilities can be significantly improved through Reinforcement Learning with Verifiable Reward (RLVR), particularly in domains like mathematics and programming, where ground-truth correctness can be automatically evaluated. However, extending this success to other reasoning-intensive domains remains challenging due to the scarcity of high-quality, verifiable datasets and the high cost of human supervision. In this work, we introduce the Loong Project: an open-source framework for scalable synthetic data generation and verification across a diverse range of reasoning-intensive domains. The framework consists of two key components: (1) LoongBench, a curated seed dataset containing 8,729 human-vetted examples across 12 domains (e.g., Advanced Mathematics, Chemistry, Logic), each paired with executable code and rich metadata; and (2) LoongEnv, a modular synthetic data generation environment that supports multiple prompting strategies to produce new question-answer-code triples. Together, these components form an agent-environment loop that enables reinforcement learning, where an LLM-based agent is rewarded for generating Chain-of-Thought (CoT) solutions that align with code-executed answers. Empirically, we benchmark LoongBench on a broad suite of both open-source and proprietary LLMs to evaluate domain coverage and reveal performance bottlenecks. In addition, we conduct a comprehensive analysis of synthetic data generated by LoongEnv, examining correctness, difficulty, and diversity. Code and documentation are available at https://github.com/camel-ai/loong.
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
Recent advances in Large Language Models (LLMs) have shown that their reasoning capabilities can be significantly improved through Reinforcement Learning with Verifiable Reward (RLVR), particularly in domains like mathematics and programming, where ground-truth correctness can be
- Year
- 2025
- Venue
- arXiv 2025
- Authors
- 46
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2509.03059ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
46Chengxing XieHaoran WangHao WangZeyu ZhangYifan WuBowen LiFan YangGuohao LiBernard GhanemWendong FanZiyu YeYifeng WangQianshuo YePhilip TorrGuangtao ZengZiYi YangYang WangHao SunTong LiuYunpu MaZihao ZhuRuohao GuoZhaowei WangHao ShenYuan HeZiyang WangXin GaoJunxiao YangJinhe BiXingyue HuangRishabhGregor FrankeJiamu BaiWeijie BaiZifeng DingYiqun DuanChengyu FanZhuangzhuang HeXianglong HuNeil JohnsonFangru LinSiyu LinBeibei WangFangyijie WangZikai XiaoYuwen Ebony Zhang