0

Chi Chen

Papers
23

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
23papers

Authored papers

23

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

arXiv 2026

2026

Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation

arXiv 2026

2026

Imagination Helps Visual Reasoning, But Not Yet in Latent Space

arXiv 2026

2026

Does Seeing More Mean Knowing More? Mono-Anchored Advantage Normalization for Multi-Source Visual Reasoning

arXiv 2026

2026

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning

arXiv 2025

2025

Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models

arXiv 2025

2025

MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding

arXiv 2025

2025

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

arXiv 2025

2025

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

arXiv 2025

2025

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding

arXiv 2025

2025

Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition

arXiv 2025

2025

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization

CVPR 2025 1

2025

Visual Abstract Thinking Empowers Multimodal Reasoning

arXiv 2025

2025

Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model

arXiv 2025

2025

ChartEdit: How Far Are MLLMs From Automating Chart Analysis? Evaluating MLLMs' Capability via Chart Editing

arXiv 2025

2025

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency

ICCV 2025

2025

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

arXiv 2024

2024

StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding

arXiv 2024

2024

Model Composition for Multimodal Large Language Models

arXiv 2024

2024

ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models

arXiv 2024

2024

Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion

arXiv 2024

2024

Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models

arXiv 2023

2023

Mask-Align: Self-Supervised Neural Word Alignment

ACL 2021 5

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 23 papers