Ziyong Feng
- Papers
- 18
Cite
Notes
Only stored in your browser.
Authored papers
18DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset
arXiv 2026
UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards
arXiv 2026
LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence
arXiv 2026
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
arXiv 2026
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm
arXiv 2025
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training
arXiv 2025
ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder
arXiv 2025
Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval
arXiv 2025
Region-based Cluster Discrimination for Visual Representation Learning
ICCV 2025
MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space
ICCV 2025
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
arXiv 2025
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
arXiv 2025
Decoupled Global-Local Alignment for Improving Compositional Understanding
arXiv 2025
Multi-label Cluster Discrimination for Visual Representation Learning
arXiv 2024
VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling
arXiv 2024
RWKV-CLIP: A Robust Vision-Language Representation Learner
arXiv 2024
Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
arXiv 2024
ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
ICCV 2023 1
Affiliations
Frequent co-authors
10from 18 papers