Zesen Cheng

Papers: 16

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

16papers

Authored papers

Qwen2.5-VL Technical Report

arXiv 2025

2025

Qwen3-VL Technical Report

arXiv 2025

2025

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

arXiv 2025

2025

MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation

arXiv 2025

2025

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

arXiv 2024

2024

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

arXiv 2024

2024

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

CVPR 2025 1

2024

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

arXiv 2024

2024

Large Language Models Can Self-Improve in Long-context Reasoning

arXiv 2024

2024

A Survey on the Honesty of Large Language Models

arXiv 2024

2024

Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation

arXiv 2024

2024

GraCo: Granularity-Controllable Interactive Segmentation

CVPR 2024 1

2024

DiffusionRet: Generative Text-Video Retrieval with Diffusion Model

ICCV 2023 1

2023

FreestyleRet: Retrieving Images from Style-Diversified Queries

arXiv 2023

2023

Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment

arXiv 2023

2023

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding

CVPR 2023 1

2022

Affiliations

No known affiliations.

Frequent co-authors

from 16 papers

Hang Zhang

Chang Liu

Kehan Li

Peng Jin

Deli Zhao

Jie Chen

Lidong Bing

Xin Li

Hao Li

Li Yuan