0

Yuan Yao

Papers
38

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
38papers

Authored papers

38

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

arXiv 2026

2026

LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs?

arXiv 2026

2026

Omni-DuplexEval: Evaluating Real-time Duplex Omni-modal Interaction

arXiv 2026

2026

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning

arXiv 2025

2025

Process Reinforcement through Implicit Rewards

arXiv 2025

2025

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

arXiv 2025

2025

An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes

arXiv 2025

2025

V-MAGE: A Game Evaluation Framework for Assessing Vision-Centric Capabilities in Multimodal Large Language Models

arXiv 2025

2025

FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning

arXiv 2025

2025

RLPR: Extrapolating RLVR to General Domains without Verifiers

arXiv 2025

2025

Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective

arXiv 2025

2025

Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking

arXiv 2025

2025

RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

CVPR 2025 1

2024

Autoregressive Models in Vision: A Survey

arXiv 2024

2024

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

arXiv 2024

2024

Elucidating the design space of language models for image generation

arXiv 2024

2024

ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis

arXiv 2024

2024

GUICourse: From General Vision Language Models to Versatile GUI Agents

arXiv 2024

2024

PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes

arXiv 2024

2024

Rethinking Guidance Information to Utilize Unlabeled Samples:A Label Encoding Perspective

arXiv 2024

2024

MoleculeQA: A Dataset to Evaluate Factual Accuracy in Molecular Comprehension

arXiv 2024

2024

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

CVPR 2024 1

2023

Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants

arXiv 2023

2023

KoLA: Carefully Benchmarking World Knowledge of Large Language Models

arXiv 2023

2023

InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery

arXiv 2023

2023

Mitigating the Alignment Tax of RLHF

arXiv 2023

2023

NExT-Chat: An LMM for Chat, Detection and Segmentation

arXiv 2023

2023

Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages

arXiv 2023

2023

VPGTrans: Transfer Visual Prompt Generator across LLMs

NeurIPS 2023 11

2023

Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey

arXiv 2023

2023

Inducing Neural Collapse in Deep Long-tailed Learning

arXiv 2023

2023

An Embarrassingly Simple Backdoor Attack on Self-supervised Learning

ICCV 2023 1

2022

DCT-Net: Domain-Calibrated Translation for Portrait Stylization

arXiv 2022

2022

CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models

cpt-colorful-prompt-tuning-for-pre-trained-1

2021

OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction

opennre-an-open-and-extensible-toolkit-for-1

2019

DocRED: A Large-Scale Document-Level Relation Extraction Dataset

docred-a-large-scale-document-level-relation-1

2019

Global Convergence of Block Coordinate Descent in Deep Learning

arXiv 2018

2018

Visual Attribute Transfer through Deep Image Analogy

arXiv 2017

2017

Affiliations

No known affiliations.

Frequent co-authors

10

from 38 papers