Weixun Wang
- Papers
- 13
Cite
Notes
Only stored in your browser.
Authored papers
13Complementary Reinforcement Learning
arXiv 2026
One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling
arXiv 2026
Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library
arXiv 2025
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models
arXiv 2025
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
arXiv 2025
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
arXiv 2025
USB: A Comprehensive and Unified Safety Evaluation Benchmark for Multimodal Large Language Models
arXiv 2025
ProgCo: Program Helps Self-Correction of Large Language Models
arXiv 2025
Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation
arXiv 2025
Think-J: Learning to Think for Generative LLM-as-a-Judge
arXiv 2025
GEM: A Gym for Agentic LLMs
arXiv 2025
The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
arXiv 2024
A2C is a special case of PPO
arXiv 2022
Affiliations
Frequent co-authors
10from 13 papers