Kevin Qinghong Lin
- Papers
- 29
Cite
Notes
Only stored in your browser.
Authored papers
29AI for Auto-Research: Roadmap & User Guide
arXiv 2026
Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration
arXiv 2026
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents
arXiv 2026
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
arXiv 2026
FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection
arXiv 2026
Code2World: A GUI World Model via Renderable Code Generation
arXiv 2026
Code2Video: A Code-centric Paradigm for Educational Video Generation
arXiv 2025
Paper2Video: Automatic Video Generation from Scientific Papers
arXiv 2025
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers
arXiv 2025
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
arXiv 2025
ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands
arXiv 2025
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation
arXiv 2025
Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?
arXiv 2025
Grounding Computer Use Agents on Human Demonstrations
arXiv 2025
Computer-Use Agents as Judges for Generative User Interface
arXiv 2025
Reinforcement Learning in Vision: A Survey
arXiv 2025
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
CVPR 2025 1
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
arXiv 2025
ROICtrl: Boosting Instance Control for Visual Generation
CVPR 2025 1
Learning Long-form Video Prior via Generative Pre-Training
arXiv 2024
Learning Video Context as Interleaved Multimodal Sequences
arXiv 2024
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
arXiv 2024
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
CVPR 2025 1
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
ICCV 2023 1
Bootstrapping SparseFormers from Vision Foundation Models
CVPR 2024 1
Too Large; Data Reduction for Vision-Language Pre-Training
ICCV 2023 1
UniVTG: Towards Unified Video-Language Temporal Grounding
ICCV 2023 1
VisorGPT: Learning Visual Prior via Generative Pre-Training
arXiv 2023
Egocentric Video-Language Pretraining
arXiv 2022
Affiliations
Frequent co-authors
10from 29 papers