Maosong Sun
Tsinghua professor and founder of THUNLP; senior figure behind OpenBMB, CPM, MiniCPM, and a long line of Chinese NLP foundations.
- Role
- professor
- Currently at
- Tsinghua University
- Scholar
- scholar.google.com/citations
- Papers
- 119
Cite
Notes
Only stored in your browser.
Authored papers
119MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction
arXiv 2026
From Context to Skills: Can Language Models Learn from Context Skillfully?
arXiv 2026
Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation
arXiv 2026
Data Science and Technology Towards AGI Part I: Tiered Data Management
arXiv 2026
SE-Bench: Benchmarking Self-Evolution with Knowledge Internalization
arXiv 2026
Does Seeing More Mean Knowing More? Mono-Anchored Advantage Normalization for Multi-Source Visual Reasoning
arXiv 2026
Imagination Helps Visual Reasoning, But Not Yet in Latent Space
arXiv 2026
AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research
arXiv 2026
AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs
arXiv 2025
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
arXiv 2025
MiniCPM4: Ultra-Efficient LLMs on End Devices
arXiv 2025
Process Reinforcement through Implicit Rewards
arXiv 2025
Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts
arXiv 2025
Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
arXiv 2025
FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling
arXiv 2025
Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning
arXiv 2025
AutoReproduce: Automatic AI Experiment Reproduction with Paper Lineage
arXiv 2025
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe
arXiv 2025
BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity
arXiv 2025
CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages
arXiv 2025
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators
arXiv 2025
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
arXiv 2025
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
arXiv 2025
Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition
arXiv 2025
EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents
arXiv 2025
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs
arXiv 2025
HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization
arXiv 2025
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization
CVPR 2025 1
Cost-Optimal Grouped-Query Attention for Long-Context Modeling
arXiv 2025
FaithLens: Detecting and Explaining Faithfulness Hallucination
arXiv 2025
From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones
arXiv 2025
RLPR: Extrapolating RLVR to General Domains without Verifiers
arXiv 2025
StateX: Enhancing RNN Recall via Post-training State Expansion
arXiv 2025
LLM$\times$MapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long Resources
arXiv 2025
PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric Pruning
arXiv 2025
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
arXiv 2025
DCAD-2000: A Multilingual Dataset across 2000+ Languages with Data Cleaning as Anomaly Detection
arXiv 2025
RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
CVPR 2025 1
RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation
arXiv 2024
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
arXiv 2024
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
arXiv 2024
$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens
arXiv 2024
UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset
arXiv 2024
Scaling Efficient Masked Image Modeling on Large Remote Sensing Dataset
ICCV 2025
Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance
arXiv 2024
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
arXiv 2024
Advancing LLM Reasoning Generalists with Preference Trees
arXiv 2024
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework
arXiv 2024
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
arXiv 2024
WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models
arXiv 2024
GUICourse: From General Vision Language Models to Versatile GUI Agents
arXiv 2024
Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding
arXiv 2024
MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization
arXiv 2024
Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents
arXiv 2024
Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models
arXiv 2024
Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation
arXiv 2024
MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing
arXiv 2024
GATEAU: Selecting Influential Samples for Long Context Alignment
arXiv 2024
OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language Models
arXiv 2024
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models
arXiv 2024
Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models
arXiv 2024
Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment
arXiv 2024
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
arXiv 2024
Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication
arXiv 2024
Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs
arXiv 2024
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence
arXiv 2024
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer
arXiv 2024
StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
arXiv 2024
DebugBench: Evaluating Debugging Capability of Large Language Models
arXiv 2024
Model Composition for Multimodal Large Language Models
arXiv 2024
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models
arXiv 2024
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer
arXiv 2024
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
arXiv 2024
Robust and Scalable Model Editing for Large Language Models
arXiv 2024
Exploring Perceptual Limitation of Multimodal Large Language Models
arXiv 2024
Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion
arXiv 2024
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
EMNLP
UltraFeedback: Boosting Language Models with High-quality Feedback
ICML
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
arXiv 2023
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
arXiv 2023
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models
c-eval-a-multi-level-multi-discipline-chinese
CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval
arXiv 2023
Tool Learning with Foundation Models
arXiv 2023
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
arXiv 2023
Sparse Low-rank Adaptation of Pre-trained Language Models
arXiv 2023
Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants
arXiv 2023
Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
arXiv 2023
Won't Get Fooled Again: Answering Questions with False Premises
arXiv 2023
OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained Models
arXiv 2023
ProAgent: From Robotic Process Automation to Agentic Process Automation
arXiv 2023
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
CVPR 2024 1
Plug-and-Play Knowledge Injection for Pre-trained Language Models
arXiv 2023
TunesFormer: Forming Tunes with Control Codes
arXiv 2023
Plug-and-Play Document Modules for Pre-trained Models
arXiv 2023
MUSER: A Multi-View Similar Case Retrieval Dataset
arXiv 2023
ConPET: Continual Parameter-Efficient Tuning for Large Language Models
arXiv 2023
Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHub
arXiv 2023
Exploring Format Consistency for Instruction Tuning
arXiv 2023
Exploring the Impact of Model Scaling on Parameter-Efficient Tuning
arXiv 2023
Chord-Conditioned Melody Harmonization with Controllable Harmonicity
arXiv 2022
Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task
arXiv 2022
Fuse It More Deeply! A Variational Transformer with Layer-Wise Latent Variable Inference for Text Generation
fuse-it-more-deeply-a-variational-transformer
An Extensible Plug-and-Play Method for Multi-Aspect Controllable Text Generation
arXiv 2022
Packed Levitated Marker for Entity and Relation Extraction
ACL 2022 5
Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents
arXiv 2021
Fully Hyperbolic Neural Networks
ACL 2022 5
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Findings (ACL) 2022 5
Sub-Character Tokenization for Chinese Pretrained Language Models
arXiv 2021
Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger
ACL 2021 5
OpenPrompt: An Open-source Framework for Prompt-learning
ACL 2022 5
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
cpt-colorful-prompt-tuning-for-pre-trained-1
Mask-Align: Self-Supervised Neural Word Alignment
ACL 2021 5
CokeBERT: Contextual Knowledge Selection and Embedding towards Enhanced Pre-Trained Language Models
arXiv 2020
CPM: A Large-scale Generative Chinese Pre-trained Language Model
arXiv 2020
Coreferential Reasoning Learning for Language Representation
EMNLP 2020 11
OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction
opennre-an-open-and-extensible-toolkit-for-1
DocRED: A Large-Scale Document-Level Relation Extraction Dataset
docred-a-large-scale-document-level-relation-1
FewRel 2.0: Towards More Challenging Few-Shot Relation Classification
fewrel-20-towards-more-challenging-few-shot-1
Word-level Textual Adversarial Attacking as Combinatorial Optimization
word-level-textual-adversarial-attacking-as
Affiliations
Frequent co-authors
10from 119 papers