Can Qin
- Papers
- 18
Cite
Notes
Only stored in your browser.
Authored papers
18LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs
arXiv 2026
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset
arXiv 2025
HoliTom: Holistic Token Merging for Fast Video Large Language Models
arXiv 2025
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding
arXiv 2025
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
arXiv 2025
VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents
arXiv 2025
When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
arXiv 2025
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training
arXiv 2025
CoDA: Coding LM via Diffusion Adaptation
arXiv 2025
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
arXiv 2025
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
CVPR 2025 1
SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
arXiv 2024
Image as Set of Points
arXiv 2023
UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild
unicontrol-a-unified-diffusion-model-for
Making Reconstruction-based Method Great Again for Video Anomaly Detection
arXiv 2023
Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework
arXiv 2022
Rethinking Adam: A Twofold Exponential Moving Average Approach
adapting-stepsizes-by-momentumized-gradients-1
Self-Directed Online Machine Learning for Topology Optimization
arXiv 2020
Affiliations
Frequent co-authors
10from 18 papers