Weidong Cai
- Papers
- 17
Cite
Notes
Only stored in your browser.
Authored papers
17DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset
arXiv 2026
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm
arXiv 2025
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
arXiv 2025
MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention
arXiv 2025
The Collapse of Patches
arXiv 2025
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
arXiv 2025
Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights
arXiv 2024
RWKV-CLIP: A Robust Vision-Language Representation Learner
arXiv 2024
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
arXiv 2024
Multimodal Causal Reasoning Benchmark: Challenging Vision Large Language Models to Infer Causal Links Between Siamese Images
arXiv 2024
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
arXiv 2023
CelebV-Text: A Large-Scale Facial Text-Video Dataset
CVPR 2023 1
PaRot: Patch-Wise Rotation-Invariant Network via Feature Disentanglement and Pose Restoration
arXiv 2023
Taxonomy Adaptive Cross-Domain Adaptation in Medical Imaging via Optimization Trajectory Distillation
ICCV 2023 1
Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds
arXiv 2022
SQUID: Deep Feature In-Painting for Unsupervised Anomaly Detection
CVPR 2023 1
Deep Clustering via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization
deep-clustering-via-joint-convolutional-1
Affiliations
Frequent co-authors
10from 17 papers