Zesen Cheng
- Papers
- 16
Cite
Notes
Only stored in your browser.
Authored papers
16Qwen2.5-VL Technical Report
arXiv 2025
Qwen3-VL Technical Report
arXiv 2025
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding
arXiv 2025
MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation
arXiv 2025
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
arXiv 2024
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
arXiv 2024
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
CVPR 2025 1
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
arXiv 2024
Large Language Models Can Self-Improve in Long-context Reasoning
arXiv 2024
A Survey on the Honesty of Large Language Models
arXiv 2024
Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation
arXiv 2024
GraCo: Granularity-Controllable Interactive Segmentation
CVPR 2024 1
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
ICCV 2023 1
Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
arXiv 2023
FreestyleRet: Retrieving Images from Style-Diversified Queries
arXiv 2023
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
CVPR 2023 1
Affiliations
Frequent co-authors
10from 16 papers