Cite
Notes
Only stored in your browser.
Attribution
Multi-Task GRPO: Reliable LLM Reasoning Across Tasks
arXiv 2026
Robust Multi-Objective Controlled Decoding of Large Language Models
arXiv 2025
Group Robust Preference Optimization in Reward-free RLHF
arXiv 2024
from 3 papers
Ilija Bogunovic
Haitham Bou-Ammar
Sangwoong Yoon
Aurelien Lucchi
Iason Chaimalas
Matthieu Zimmer
Pier Giuseppe Sessa
Seongho Son
Viraj Mehta
William Bankes