Alex Jinpeng Wang
- Papers
- 15
Cite
Notes
Only stored in your browser.
Authored papers
15MIND: Benchmarking Memory Consistency and Action Control in World Models
arXiv 2026
FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matching
arXiv 2026
When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models
arXiv 2026
Glance: Accelerating Diffusion Models with 1 Sample
arXiv 2025
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation
arXiv 2025
TextEditBench: Evaluating Reasoning-aware Text Editing Beyond Rendering
arXiv 2025
From Charts to Code: A Hierarchical Benchmark for Multimodal Models
arXiv 2025
V-MAGE: A Game Evaluation Framework for Assessing Vision-Centric Capabilities in Multimodal Large Language Models
arXiv 2025
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
arXiv 2024
UniVTG: Towards Unified Video-Language Temporal Grounding
ICCV 2023 1
Too Large; Data Reduction for Vision-Language Pre-Training
ICCV 2023 1
Parrot Captions Teach CLIP to Spot Text
arXiv 2023
All in One: Exploring Unified Video-Language Pre-training
CVPR 2023 1
Egocentric Video-Language Pretraining
arXiv 2022
Position-guided Text Prompt for Vision-Language Pre-training
CVPR 2023 1
Affiliations
Frequent co-authors
10from 15 papers