Weiyun Wang

Papers: 21

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

21papers

Authored papers

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

arXiv 2025

2025

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

arXiv 2025

2025

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

arXiv 2025

2025

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

arXiv 2025

2025

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

arXiv 2025

2025

Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning

arXiv 2025

2025

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

arXiv 2025

2025

Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models

arXiv 2025

2025

Sequential Diffusion Language Models

arXiv 2025

2025

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis

arXiv 2025

2025

AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning

arXiv 2025

2025

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

arXiv 2025

2025

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

arXiv 2025

2025

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

arXiv 2024

2024

Needle In A Multimodal Haystack

arXiv 2024

2024

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

arXiv 2024

2024

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

arXiv 2024

2024

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

arXiv 2024

2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

arXiv 2024

2024

MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity

arXiv 2024

2024

Demystify Transformers & Convolutions in Modern Image Deep Networks

arXiv 2022

2022

Affiliations

No known affiliations.

Frequent co-authors

from 21 papers

Wenhai Wang

Jifeng Dai

Yu Qiao

Xizhou Zhu

Zhe Chen

Lewei Lu

Tong Lu

Gen Luo

Changyao Tian

Haodong Duan