Yuan Yao

Process Reinforcement through Implicit Rewards

arXiv 2025

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

arXiv 2025

FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning

arXiv 2025

RLPR: Extrapolating RLVR to General Domains without Verifiers

arXiv 2025

An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes

arXiv 2025

Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective

arXiv 2025

Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking

arXiv 2025

V-MAGE: A Game Evaluation Framework for Assessing Vision-Centric Capabilities in Multimodal Large Language Models

arXiv 2025

RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

CVPR 2025 1

Autoregressive Models in Vision: A Survey

arXiv 2024

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

arXiv 2024

Rethinking Guidance Information to Utilize Unlabeled Samples:A Label Encoding Perspective

arXiv 2024

MoleculeQA: A Dataset to Evaluate Factual Accuracy in Molecular Comprehension

arXiv 2024

GUICourse: From General Vision Language Models to Versatile GUI Agents

arXiv 2024

Elucidating the design space of language models for image generation

arXiv 2024

PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes

arXiv 2024

ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis

arXiv 2024

Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages

arXiv 2023

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

CVPR 2024 1

VPGTrans: Transfer Visual Prompt Generator across LLMs

NeurIPS 2023 11

Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey

arXiv 2023

Mitigating the Alignment Tax of RLHF

arXiv 2023

Inducing Neural Collapse in Deep Long-tailed Learning

arXiv 2023

Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants

arXiv 2023

KoLA: Carefully Benchmarking World Knowledge of Large Language Models

arXiv 2023

InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery

arXiv 2023

NExT-Chat: An LMM for Chat, Detection and Segmentation

arXiv 2023