Qi Dai
- Papers
- 26
Cite
Notes
Only stored in your browser.
Authored papers
26SkillOpt: Executive Strategy for Self-Evolving Agent Skills
arXiv 2026
From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills
arXiv 2026
MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation
arXiv 2026
BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation
arXiv 2026
ArcFlow: Unleashing 2-Step Text-to-Image Generation via High-Precision Non-Linear Flow Distillation
arXiv 2026
AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation
arXiv 2026
FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance
arXiv 2026
Covering Human Action Space for Computer Use: Data Synthesis and Benchmark
arXiv 2026
StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation
arXiv 2025
JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers
ICCV 2025
Subject-driven Video Generation via Disentangled Identity and Motion
arXiv 2025
Phi-Ground Tech Report: Advancing Perception in GUI Grounding
arXiv 2025
MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance
ICCV 2025
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion
arXiv 2025
FlashPortrait: 6x Faster Infinite Portrait Animation with Adaptive Latent Prediction
arXiv 2025
StableAnimator: High-Quality Identity-Preserving Human Image Animation
CVPR 2025 1
REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents
ICCV 2025
LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation
arXiv 2024
A Survey on Video Diffusion Models
arXiv 2023
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
ICCV 2023 1
ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules
ICCV 2023 1
MotionEditor: Editing Video Motion via Content-Aware Diffusion
CVPR 2024 1
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
ICCV 2023 1
ResFormer: Scaling ViTs with Multi-Resolution Training
CVPR 2023 1
SimMIM: A Simple Framework for Masked Image Modeling
CVPR 2022 1
Self-Supervised Learning with Swin Transformers
arXiv 2021
Affiliations
Frequent co-authors
10from 26 papers