Shuang Qiu

Cite

Notes

Only stored in your browser.

Attribution

5papers

Authored papers

Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models

arXiv 2025

Self-Reflective Generation at Test Time

arXiv 2025

Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards

arXiv 2024

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

arXiv 2024

Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning

arXiv 2022

No known affiliations.

from 5 papers

Rui Yang

Chengwei Qin

Chenjia Bai

Dan Ye

Dong Yu

Feng Luo

Han Zhao

Han Zhong

Haoxiang Wang

Jian Mu