Wei Xiong
- Papers
- 11
Cite
Notes
Only stored in your browser.
Authored papers
11Aurora: Unified Video Editing with a Tool-Using Agent
arXiv 2026
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
arXiv 2025
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
arXiv 2025
Self-rewarding correction for mathematical reasoning
arXiv 2025
Diffusion Model-Based Image Editing: A Survey
arXiv 2024
WAS: Dataset and Methods for Artistic Text Segmentation
arXiv 2024
Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
arXiv 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
arXiv 2024
Mitigating the Alignment Tax of RLHF
arXiv 2023
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration
NeurIPS 2023 11
Distributional Reinforcement Learning for Multi-Dimensional Reward Functions
NeurIPS 2021 12
Affiliations
Frequent co-authors
10from 11 papers