Teng Wang
- Papers
- 19
Cite
Notes
Only stored in your browser.
Authored papers
19DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories
arXiv 2026
PhotoBench: Beyond Visual Matching Towards Personalized Intent-Driven Photo Retrieval
arXiv 2026
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
arXiv 2025
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
arXiv 2025
Reinforcing Video Reasoning with Focused Thinking
arXiv 2025
TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
arXiv 2025
ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries
arXiv 2025
ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts
arXiv 2025
GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers
ICCV 2025
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
arXiv 2025
Decoupled Seg Tokens Make Stronger Reasoning Video Segmenter and Grounder
arXiv 2025
From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model
arXiv 2025
AudioStory: Generating Long-Form Narrative Audio with Large Language Models
arXiv 2025
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
CVPR 2025 1
BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving
arXiv 2024
Video Understanding with Large Language Models: A Survey
arXiv 2023
Caption Anything: Interactive Image Description with Diverse Multimodal Controls
arXiv 2023
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
ICCV 2023 1
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models
ICCV 2023 1
Affiliations
Frequent co-authors
10from 19 papers