0

Kevin Qinghong Lin

Papers
29

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
29papers

Authored papers

29

AI for Auto-Research: Roadmap & User Guide

arXiv 2026

2026

Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration

arXiv 2026

2026

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

arXiv 2026

2026

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

arXiv 2026

2026

FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection

arXiv 2026

2026

Code2World: A GUI World Model via Renderable Code Generation

arXiv 2026

2026

Code2Video: A Code-centric Paradigm for Educational Video Generation

arXiv 2025

2025

Paper2Video: Automatic Video Generation from Scientific Papers

arXiv 2025

2025

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

arXiv 2025

2025

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

arXiv 2025

2025

ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands

arXiv 2025

2025

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

arXiv 2025

2025

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

arXiv 2025

2025

Grounding Computer Use Agents on Human Demonstrations

arXiv 2025

2025

Computer-Use Agents as Judges for Generative User Interface

arXiv 2025

2025

Reinforcement Learning in Vision: A Survey

arXiv 2025

2025

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary

CVPR 2025 1

2025

VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning

arXiv 2025

2025

ROICtrl: Boosting Instance Control for Visual Generation

CVPR 2025 1

2024

Learning Long-form Video Prior via Generative Pre-Training

arXiv 2024

2024

Learning Video Context as Interleaved Multimodal Sequences

arXiv 2024

2024

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

arXiv 2024

2024

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

CVPR 2025 1

2024

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

ICCV 2023 1

2023

Bootstrapping SparseFormers from Vision Foundation Models

CVPR 2024 1

2023

Too Large; Data Reduction for Vision-Language Pre-Training

ICCV 2023 1

2023

UniVTG: Towards Unified Video-Language Temporal Grounding

ICCV 2023 1

2023

VisorGPT: Learning Visual Prior via Generative Pre-Training

arXiv 2023

2023

Egocentric Video-Language Pretraining

arXiv 2022

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 29 papers