0

Maosong Sun

Tsinghua professor and founder of THUNLP; senior figure behind OpenBMB, CPM, MiniCPM, and a long line of Chinese NLP foundations.

Role
professor
Papers
119

Cite

Notes

Only stored in your browser.

119papers

Authored papers

119

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

arXiv 2026

2026

From Context to Skills: Can Language Models Learn from Context Skillfully?

arXiv 2026

2026

Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation

arXiv 2026

2026

Data Science and Technology Towards AGI Part I: Tiered Data Management

arXiv 2026

2026

SE-Bench: Benchmarking Self-Evolution with Knowledge Internalization

arXiv 2026

2026

Does Seeing More Mean Knowing More? Mono-Anchored Advantage Normalization for Multi-Source Visual Reasoning

arXiv 2026

2026

Imagination Helps Visual Reasoning, But Not Yet in Latent Space

arXiv 2026

2026

AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research

arXiv 2026

2026

AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

arXiv 2025

2025

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning

arXiv 2025

2025

MiniCPM4: Ultra-Efficient LLMs on End Devices

arXiv 2025

2025

Process Reinforcement through Implicit Rewards

arXiv 2025

2025

Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts

arXiv 2025

2025

Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models

arXiv 2025

2025

FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling

arXiv 2025

2025

Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning

arXiv 2025

2025

AutoReproduce: Automatic AI Experiment Reproduction with Paper Lineage

arXiv 2025

2025

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

arXiv 2025

2025

BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity

arXiv 2025

2025

CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages

arXiv 2025

2025

TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators

arXiv 2025

2025

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

arXiv 2025

2025

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding

arXiv 2025

2025

Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition

arXiv 2025

2025

EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents

arXiv 2025

2025

APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs

arXiv 2025

2025

HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization

arXiv 2025

2025

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization

CVPR 2025 1

2025

Cost-Optimal Grouped-Query Attention for Long-Context Modeling

arXiv 2025

2025

FaithLens: Detecting and Explaining Faithfulness Hallucination

arXiv 2025

2025

From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones

arXiv 2025

2025

RLPR: Extrapolating RLVR to General Domains without Verifiers

arXiv 2025

2025

StateX: Enhancing RNN Recall via Post-training State Expansion

arXiv 2025

2025

LLM$\times$MapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long Resources

arXiv 2025

2025

PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric Pruning

arXiv 2025

2025

A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules

arXiv 2025

2025

DCAD-2000: A Multilingual Dataset across 2000+ Languages with Data Cleaning as Anomaly Detection

arXiv 2025

2025

RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

CVPR 2025 1

2024

RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation

arXiv 2024

2024

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

arXiv 2024

2024

OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems

arXiv 2024

2024

$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens

arXiv 2024

2024

UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset

arXiv 2024

2024

Scaling Efficient Masked Image Modeling on Large Remote Sensing Dataset

ICCV 2025

2024

Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance

arXiv 2024

2024

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

arXiv 2024

2024

Advancing LLM Reasoning Generalists with Preference Trees

arXiv 2024

2024

RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework

arXiv 2024

2024

StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models

arXiv 2024

2024

WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models

arXiv 2024

2024

GUICourse: From General Vision Language Models to Versatile GUI Agents

arXiv 2024

2024

Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding

arXiv 2024

2024

MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization

arXiv 2024

2024

Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents

arXiv 2024

2024

Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models

arXiv 2024

2024

Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation

arXiv 2024

2024

MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing

arXiv 2024

2024

GATEAU: Selecting Influential Samples for Long Context Alignment

arXiv 2024

2024

OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language Models

arXiv 2024

2024

ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models

arXiv 2024

2024

Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models

arXiv 2024

2024

Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment

arXiv 2024

2024

BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

arXiv 2024

2024

Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication

arXiv 2024

2024

Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs

arXiv 2024

2024

Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence

arXiv 2024

2024

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

arXiv 2024

2024

StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding

arXiv 2024

2024

DebugBench: Evaluating Debugging Capability of Large Language Models

arXiv 2024

2024

Model Composition for Multimodal Large Language Models

arXiv 2024

2024

Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models

arXiv 2024

2024

ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer

arXiv 2024

2024

Sparsing Law: Towards Large Language Models with Greater Activation Sparsity

arXiv 2024

2024

Robust and Scalable Model Editing for Large Language Models

arXiv 2024

2024

Exploring Perceptual Limitation of Multimodal Large Language Models

arXiv 2024

2024

Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion

arXiv 2024

2024

Enhancing Chat Language Models by Scaling High-quality Instructional Conversations

EMNLP

2023

UltraFeedback: Boosting Language Models with High-quality Feedback

ICML

2023

AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors

arXiv 2023

2023

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

arXiv 2023

2023

C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models

c-eval-a-multi-level-multi-discipline-chinese

2023

CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval

arXiv 2023

2023

Tool Learning with Foundation Models

arXiv 2023

2023

Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages

arXiv 2023

2023

Sparse Low-rank Adaptation of Pre-trained Language Models

arXiv 2023

2023

Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants

arXiv 2023

2023

Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models

arXiv 2023

2023

Won't Get Fooled Again: Answering Questions with False Premises

arXiv 2023

2023

OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained Models

arXiv 2023

2023

ProAgent: From Robotic Process Automation to Agentic Process Automation

arXiv 2023

2023

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

CVPR 2024 1

2023

Plug-and-Play Knowledge Injection for Pre-trained Language Models

arXiv 2023

2023

TunesFormer: Forming Tunes with Control Codes

arXiv 2023

2023

Plug-and-Play Document Modules for Pre-trained Models

arXiv 2023

2023

MUSER: A Multi-View Similar Case Retrieval Dataset

arXiv 2023

2023

ConPET: Continual Parameter-Efficient Tuning for Large Language Models

arXiv 2023

2023

Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHub

arXiv 2023

2023

Exploring Format Consistency for Instruction Tuning

arXiv 2023

2023

Exploring the Impact of Model Scaling on Parameter-Efficient Tuning

arXiv 2023

2023

Chord-Conditioned Melody Harmonization with Controllable Harmonicity

arXiv 2022

2022

Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task

arXiv 2022

2022

Fuse It More Deeply! A Variational Transformer with Layer-Wise Latent Variable Inference for Text Generation

fuse-it-more-deeply-a-variational-transformer

2022

An Extensible Plug-and-Play Method for Multi-Aspect Controllable Text Generation

arXiv 2022

2022

Packed Levitated Marker for Entity and Relation Extraction

ACL 2022 5

2021

Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents

arXiv 2021

2021

Fully Hyperbolic Neural Networks

ACL 2022 5

2021

MoEfication: Transformer Feed-forward Layers are Mixtures of Experts

Findings (ACL) 2022 5

2021

Sub-Character Tokenization for Chinese Pretrained Language Models

arXiv 2021

2021

Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger

ACL 2021 5

2021

OpenPrompt: An Open-source Framework for Prompt-learning

ACL 2022 5

2021

CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models

cpt-colorful-prompt-tuning-for-pre-trained-1

2021

Mask-Align: Self-Supervised Neural Word Alignment

ACL 2021 5

2020

CokeBERT: Contextual Knowledge Selection and Embedding towards Enhanced Pre-Trained Language Models

arXiv 2020

2020

CPM: A Large-scale Generative Chinese Pre-trained Language Model

arXiv 2020

2020

Coreferential Reasoning Learning for Language Representation

EMNLP 2020 11

2020

OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction

opennre-an-open-and-extensible-toolkit-for-1

2019

DocRED: A Large-Scale Document-Level Relation Extraction Dataset

docred-a-large-scale-document-level-relation-1

2019

FewRel 2.0: Towards More Challenging Few-Shot Relation Classification

fewrel-20-towards-more-challenging-few-shot-1

2019

Word-level Textual Adversarial Attacking as Combinatorial Optimization

word-level-textual-adversarial-attacking-as

2019

Affiliations

Currently at

Tsinghua University

professor · university lab

Frequent co-authors

10

from 119 papers