Xiang An
- Papers
- 17
Cite
Notes
Only stored in your browser.
Authored papers
17LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence
arXiv 2026
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
arXiv 2026
4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding
arXiv 2026
DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset
arXiv 2026
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm
arXiv 2025
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training
arXiv 2025
ForCenNet: Foreground-Centric Network for Document Image Rectification
ICCV 2025
ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder
arXiv 2025
Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval
arXiv 2025
Region-based Cluster Discrimination for Visual Representation Learning
ICCV 2025
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
arXiv 2025
Multi-label Cluster Discrimination for Visual Representation Learning
arXiv 2024
Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
arXiv 2024
VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling
arXiv 2024
RWKV-CLIP: A Robust Vision-Language Representation Learner
arXiv 2024
ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
ICCV 2023 1
Partial FC: Training 10 Million Identities on a Single Machine
arXiv 2020
Affiliations
Frequent co-authors
10from 17 papers