Yuan Yao
- Papers
- 38
Cite
Notes
Only stored in your browser.
Authored papers
38MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction
arXiv 2026
LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs?
arXiv 2026
Omni-DuplexEval: Evaluating Real-time Duplex Omni-modal Interaction
arXiv 2026
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
arXiv 2025
Process Reinforcement through Implicit Rewards
arXiv 2025
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe
arXiv 2025
An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes
arXiv 2025
V-MAGE: A Game Evaluation Framework for Assessing Vision-Centric Capabilities in Multimodal Large Language Models
arXiv 2025
FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning
arXiv 2025
RLPR: Extrapolating RLVR to General Domains without Verifiers
arXiv 2025
Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective
arXiv 2025
Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking
arXiv 2025
RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
CVPR 2025 1
Autoregressive Models in Vision: A Survey
arXiv 2024
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer
arXiv 2024
Elucidating the design space of language models for image generation
arXiv 2024
ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis
arXiv 2024
GUICourse: From General Vision Language Models to Versatile GUI Agents
arXiv 2024
PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes
arXiv 2024
Rethinking Guidance Information to Utilize Unlabeled Samples:A Label Encoding Perspective
arXiv 2024
MoleculeQA: A Dataset to Evaluate Factual Accuracy in Molecular Comprehension
arXiv 2024
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
CVPR 2024 1
Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants
arXiv 2023
KoLA: Carefully Benchmarking World Knowledge of Large Language Models
arXiv 2023
InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery
arXiv 2023
Mitigating the Alignment Tax of RLHF
arXiv 2023
NExT-Chat: An LMM for Chat, Detection and Segmentation
arXiv 2023
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
arXiv 2023
VPGTrans: Transfer Visual Prompt Generator across LLMs
NeurIPS 2023 11
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
arXiv 2023
Inducing Neural Collapse in Deep Long-tailed Learning
arXiv 2023
An Embarrassingly Simple Backdoor Attack on Self-supervised Learning
ICCV 2023 1
DCT-Net: Domain-Calibrated Translation for Portrait Stylization
arXiv 2022
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
cpt-colorful-prompt-tuning-for-pre-trained-1
OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction
opennre-an-open-and-extensible-toolkit-for-1
DocRED: A Large-Scale Document-Level Relation Extraction Dataset
docred-a-large-scale-document-level-relation-1
Global Convergence of Block Coordinate Descent in Deep Learning
arXiv 2018
Visual Attribute Transfer through Deep Image Analogy
arXiv 2017
Affiliations
Frequent co-authors
10from 38 papers