0

Ming Yan

Papers
35

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
35papers

Authored papers

35

ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents

arXiv 2026

2026

AgentOCR: Reimagining Agent History via Optical Self-Compression

arXiv 2026

2026

Do Phone-Use Agents Respect Your Privacy?

arXiv 2026

2026

Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation

arXiv 2025

2025

QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

arXiv 2025

2025

MAGI-1: Autoregressive Video Generation at Scale

arXiv 2025

2025

WritingBench: A Comprehensive Benchmark for Generative Writing

arXiv 2025

2025

MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding

arXiv 2025

2025

Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration

arXiv 2025

2025

Mobile-Agent-v3: Fundamental Agents for GUI Automation

arXiv 2025

2025

Qwen3Guard Technical Report

arXiv 2025

2025

WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research

arXiv 2025

2025

Enhancing Language Multi-Agent Learning with Multi-Agent Credit Re-Assignment for Interactive Environment Generalization

arXiv 2025

2025

QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management

arXiv 2025

2025

WebSailor: Navigating Super-human Reasoning for Web Agent

arXiv 2025

2025

UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning

arXiv 2025

2025

Perception-Aware Policy Optimization for Multimodal Reasoning

arXiv 2025

2025

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization

CVPR 2025 1

2025

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

arXiv 2024

2024

mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding

arXiv 2024

2024

Small LLMs Are Weak Tool Learners: A Multi-LLM Agent

arXiv 2024

2024

Model Composition for Multimodal Large Language Models

arXiv 2024

2024

SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization

CVPR 2025 1

2024

Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion

arXiv 2024

2024

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

arXiv 2023

2023

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

arXiv 2023

2023

TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training

arXiv 2023

2023

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

CVPR 2024 1

2023

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks

arXiv 2023

2023

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model

arXiv 2023

2023

CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding

arXiv 2023

2023

Improved Visual Fine-tuning with Natural Language Supervision

ICCV 2023 1

2023

CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility

arXiv 2023

2023

Evaluation and Analysis of Hallucination in Large Vision-Language Models

arXiv 2023

2023

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections

arXiv 2022

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 35 papers