Cite
Notes
Only stored in your browser.
Attribution
Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization
arXiv 2025