Qin Jin
- Papers
- 16
Cite
Notes
Only stored in your browser.
Authored papers
16WritingBench: A Comprehensive Benchmark for Generative Writing
arXiv 2025
DiG-Flow: Discrepancy-Guided Flow Matching for Robust VLA Models
arXiv 2025
TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding
arXiv 2025
TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM
arXiv 2025
A Survey of Deep Learning for Geometry Problem Solving
arXiv 2025
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos
arXiv 2025
SPAFormer: Sequential 3D Part Assembly with Transformers
arXiv 2024
EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?
arXiv 2024
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
arXiv 2024
ESCoT: Towards Interpretable Emotional Support Dialogue Systems
arXiv 2024
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World
arXiv 2024
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
arXiv 2023
Movie101: A New Movie Understanding Benchmark
arXiv 2023
Rethinking Benchmarks for Cross-modal Image-text Retrieval
arXiv 2023
MPMQA: Multimodal Question Answering on Product Manuals
arXiv 2023
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
CVPR 2023 1
Affiliations
Frequent co-authors
10from 16 papers