Kai Wang
TIGER-Lab researcher; co-author on TIGER-Lab benchmark papers.
- Role
- researcher
- Currently at
- TIGER-Lab
- Unknown
- GitHub
- Unknown
- Scholar
- scholar.google.com/scholar
- Papers
- 49
Cite
Notes
Only stored in your browser.
Authored papers
49Audio-Visual Intelligence in Large Foundation Models
arXiv 2026
JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation
arXiv 2026
Chain-of-Trajectories: Unlocking the Intrinsic Generative Optimality of Diffusion Models via Graph-Theoretic Planning
arXiv 2026
Enhance-A-Video: Better Generated Video for Free
arXiv 2025
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
arXiv 2025
InterLCM: Low-Quality Images as Intermediate States of Latent Consistency Models for Effective Blind Face Restoration
arXiv 2025
DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation
arXiv 2025
StageVAR: Stage-Aware Acceleration for Visual Autoregressive Models
arXiv 2025
Quantitative Analysis of Performance Drop in DeepSeek Model Quantization
arXiv 2025
Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark
arXiv 2025
HunyuanImage 3.0 Technical Report
arXiv 2025
Neural-Driven Image Editing
arXiv 2025
REPA Works Until It Doesn't: Early-Stopped, Holistic Alignment Supercharges Diffusion Training
arXiv 2025
One-Way Ticket:Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models
arXiv 2025
Slow-Fast Architecture for Video Multi-Modal Large Language Models
arXiv 2025
Optimizing for the Shortest Path in Denoising Diffusion Model
CVPR 2025 1
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models
mdk12-bench-a-multi-discipline-benchmark-for
Recurrent Diffusion for Large-Scale Parameter Generation
arXiv 2025
A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis
arXiv 2025
Info-Coevolution: An Efficient Framework for Data Model Coevolution
arXiv 2025
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance
arXiv 2025
Diversity Has Always Been There in Your Visual Autoregressive Models
arXiv 2025
A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment
arXiv 2025
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
NeurIPS
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
arXiv 2024
Real-Time Video Generation with Pyramid Attention Broadcast
arXiv 2024
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation
arXiv 2024
HunyuanVideo: A Systematic Framework For Large Video Generative Models
arXiv 2024
Aligning Large Language Models with Representation Editing: A Control Perspective
arXiv 2024
Neural Network Diffusion
arXiv 2024
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
arXiv 2024
Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis
arXiv 2024
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
arXiv 2024
ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement
arXiv 2024
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality
arXiv 2024
MLLMs-Augmented Visual-Language Representation Learning
arXiv 2023
InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning
arXiv 2023
Dataset Quantization
ICCV 2023 1
DREAM: Efficient Dataset Distillation by Representative Matching
ICCV 2023 1
Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing
dynamic-prompt-learning-addressing-cross
Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models
ICCV 2023 1
Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching
arXiv 2023
MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID
CVPR 2023 1
Bioformer: an efficient transformer language model for biomedical text mining
arXiv 2023
DVPT: Dynamic Visual Prompt Tuning of Large Pre-trained Models for Medical Image Analysis
arXiv 2023
Exploring the Reversal Curse and Other Deductive Logical Reasoning in BERT and GPT-Based Large Language Models
arXiv 2023
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model
ICCV 2023 1
Expanding Small-Scale Datasets with Guided Imagination
expanding-small-scale-datasets-with-guided
Multi-Domain Dialogue Acts and Response Co-Generation
multi-domain-dialogue-acts-and-response-co-1
Affiliations
Frequent co-authors
10from 49 papers