Kaicheng Yang
- Papers
- 19
Cite
Notes
Only stored in your browser.
Authored papers
19LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence
arXiv 2026
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
arXiv 2026
MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI
arXiv 2026
DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset
arXiv 2026
UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards
arXiv 2026
Improving Sampling for Masked Diffusion Models via Information Gain
arXiv 2026
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm
arXiv 2025
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
arXiv 2025
Decoupled Global-Local Alignment for Improving Compositional Understanding
arXiv 2025
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training
arXiv 2025
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
arXiv 2025
ForCenNet: Foreground-Centric Network for Document Image Rectification
ICCV 2025
ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder
arXiv 2025
Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval
arXiv 2025
Region-based Cluster Discrimination for Visual Representation Learning
ICCV 2025
Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
arXiv 2024
Multi-label Cluster Discrimination for Visual Representation Learning
arXiv 2024
RWKV-CLIP: A Robust Vision-Language Representation Learner
arXiv 2024
ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
ICCV 2023 1
Affiliations
Frequent co-authors
10from 19 papers