Min Yang

OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis

arXiv 2025

R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO

arXiv 2025

IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property

arXiv 2025

PEToolLLM: Towards Personalized Tool Learning in Large Language Models

arXiv 2025

Are Large Reasoning Models Good Translation Evaluators? Analysis and Performance Boost

arXiv 2025

Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR

arXiv 2025

ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents

arXiv 2025

VIPER: Process-aware Evaluation for Generative Video Reasoning

arXiv 2025

SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner

arXiv 2025

Distillation Quantification for Large Language Models

arXiv 2025

CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling

arXiv 2024

Agents in Software Engineering: Survey, Landscape, and Vision

arXiv 2024

AutoPatent: A Multi-Agent Framework for Automatic Patent Generation

arXiv 2024

Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

arXiv 2024

Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models

arXiv 2024

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

arXiv 2024

Can MLLMs Understand the Deep Implication Behind Chinese Images?

arXiv 2024

LIME: Less Is More for MLLM Evaluation

arXiv 2024

CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation

arXiv 2024

CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations

arXiv 2024

CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment

arXiv 2024

CollectiveSFT: Scaling Large Language Models for Chinese Medical Benchmark with Collective Instructions in Healthcare

arXiv 2024

AgentCourt: Simulating Court with Adversarial Evolvable Lawyer Agents

arXiv 2024

Long Context is Not Long at All: A Prospector of Long-Dependency Data for Large Language Models

arXiv 2024

DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception

arXiv 2024

Training on the Benchmark Is Not All You Need

arXiv 2024

PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts

arXiv 2023

Marathon: A Race Through the Realm of Long Context with Large Language Models

arXiv 2023

Iterative Forward Tuning Boosts In-Context Learning in Language Models

arXiv 2023

Contrastive variational information bottleneck for aspect-based sentiment analysis

arXiv 2023

JADE: A Linguistics-based Safety Evaluation Platform for Large Language Models

arXiv 2023

Valley: Video Assistant with Large Language model Enhanced abilitY

arXiv 2023

One-Shot Learning as Instruction Data Prospector for Large Language Models

arXiv 2023