0

Qi Dai

Papers
26

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
26papers

Authored papers

26

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

arXiv 2026

2026

From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills

arXiv 2026

2026

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

arXiv 2026

2026

BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation

arXiv 2026

2026

ArcFlow: Unleashing 2-Step Text-to-Image Generation via High-Precision Non-Linear Flow Distillation

arXiv 2026

2026

AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation

arXiv 2026

2026

FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance

arXiv 2026

2026

Covering Human Action Space for Computer Use: Data Synthesis and Benchmark

arXiv 2026

2026

StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation

arXiv 2025

2025

JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers

ICCV 2025

2025

Subject-driven Video Generation via Disentangled Identity and Motion

arXiv 2025

2025

Phi-Ground Tech Report: Advancing Perception in GUI Grounding

arXiv 2025

2025

MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance

ICCV 2025

2025

Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion

arXiv 2025

2025

FlashPortrait: 6x Faster Infinite Portrait Animation with Adaptive Latent Prediction

arXiv 2025

2025

StableAnimator: High-Quality Identity-Preserving Human Image Animation

CVPR 2025 1

2024

REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents

ICCV 2025

2024

LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation

arXiv 2024

2024

A Survey on Video Diffusion Models

arXiv 2023

2023

Implicit Temporal Modeling with Learnable Alignment for Video Recognition

ICCV 2023 1

2023

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules

ICCV 2023 1

2023

MotionEditor: Editing Video Motion via Content-Aware Diffusion

CVPR 2024 1

2023

All in Tokens: Unifying Output Space of Visual Tasks via Soft Token

ICCV 2023 1

2023

ResFormer: Scaling ViTs with Multi-Resolution Training

CVPR 2023 1

2022

SimMIM: A Simple Framework for Masked Image Modeling

CVPR 2022 1

2021

Self-Supervised Learning with Swin Transformers

arXiv 2021

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 26 papers