Chi Chen
- Papers
- 23
Cite
Notes
Only stored in your browser.
Authored papers
23MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction
arXiv 2026
Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation
arXiv 2026
Imagination Helps Visual Reasoning, But Not Yet in Latent Space
arXiv 2026
Does Seeing More Mean Knowing More? Mono-Anchored Advantage Normalization for Multi-Source Visual Reasoning
arXiv 2026
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
arXiv 2025
Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
arXiv 2025
MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding
arXiv 2025
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe
arXiv 2025
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
arXiv 2025
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
arXiv 2025
Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition
arXiv 2025
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization
CVPR 2025 1
Visual Abstract Thinking Empowers Multimodal Reasoning
arXiv 2025
Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model
arXiv 2025
ChartEdit: How Far Are MLLMs From Automating Chart Analysis? Evaluating MLLMs' Capability via Chart Editing
arXiv 2025
Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
ICCV 2025
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer
arXiv 2024
StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
arXiv 2024
Model Composition for Multimodal Large Language Models
arXiv 2024
ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models
arXiv 2024
Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion
arXiv 2024
Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
arXiv 2023
Mask-Align: Self-Supervised Neural Word Alignment
ACL 2021 5
Affiliations
Frequent co-authors
10from 23 papers