Cite
Notes
Only stored in your browser.
Attribution
CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization
arXiv 2026
from 1 papers
Abdelrahman M Shaker
Ahmed Heakl
Fahad Shahbaz Khan
Rania Elbadry
Salman Khan
Youssef Mohamed