Improving Multimodal Reasoning via Worst Dimension Optimization

Open

Preview
Year: 2026
ArXiv: arxiv.org/abs/2606.07801
Hosting: Full text hostedCC-BY-4.0

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2606.07801CC-BY-4.0
TL;DR: Semantic Scholar

Attribution policy →

Abstract

Multimodal reasoning requires a path that retains integrity over a wide range of constraints, from visual grounding to logic consistency. However, the current Process Reward Models focus on heuristically defined rewards that equally weigh these factors, which may lead to the concealment of individual dimension failures by the dominating factors, without guaranteeing the validity of the reasoning process in general.