Cite
Notes
Only stored in your browser.
Attribution
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters
arXiv 2026
STEP3-VL-10B Technical Report
Reward Shaping to Mitigate Reward Hacking in RLHF
arXiv 2025
from 3 papers
Chengyuan Yao
Qi Han
Ailin Huang
Ang Li
Aobo Kong
Bo Dong
Changyi Wan
Chunrui Han
Daxin Jiang
founder
Di Qi