0

Kai Wang

TIGER-Lab researcher; co-author on TIGER-Lab benchmark papers.

Role
researcher
Currently at
TIGER-Lab
Twitter
Unknown
GitHub
Unknown
Papers
49

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
scholar.google.com/scholar
Attribution policy →
49papers

Authored papers

49

Audio-Visual Intelligence in Large Foundation Models

arXiv 2026

2026

JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation

arXiv 2026

2026

Chain-of-Trajectories: Unlocking the Intrinsic Generative Optimality of Diffusion Models via Graph-Theoretic Planning

arXiv 2026

2026

Enhance-A-Video: Better Generated Video for Free

arXiv 2025

2025

Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

arXiv 2025

2025

InterLCM: Low-Quality Images as Intermediate States of Latent Consistency Models for Effective Blind Face Restoration

arXiv 2025

2025

DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation

arXiv 2025

2025

StageVAR: Stage-Aware Acceleration for Visual Autoregressive Models

arXiv 2025

2025

Quantitative Analysis of Performance Drop in DeepSeek Model Quantization

arXiv 2025

2025

Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark

arXiv 2025

2025

HunyuanImage 3.0 Technical Report

arXiv 2025

2025

Neural-Driven Image Editing

arXiv 2025

2025

REPA Works Until It Doesn't: Early-Stopped, Holistic Alignment Supercharges Diffusion Training

arXiv 2025

2025

One-Way Ticket:Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models

arXiv 2025

2025

Slow-Fast Architecture for Video Multi-Modal Large Language Models

arXiv 2025

2025

Optimizing for the Shortest Path in Denoising Diffusion Model

CVPR 2025 1

2025

MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models

mdk12-bench-a-multi-discipline-benchmark-for

2025

Recurrent Diffusion for Large-Scale Parameter Generation

arXiv 2025

2025

A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis

arXiv 2025

2025

Info-Coevolution: An Efficient Framework for Data Model Coevolution

arXiv 2025

2025

IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance

arXiv 2025

2025

Diversity Has Always Been There in Your Visual Autoregressive Models

arXiv 2025

2025

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

arXiv 2025

2025

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

NeurIPS

2024

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

arXiv 2024

2024

Real-Time Video Generation with Pyramid Attention Broadcast

arXiv 2024

2024

Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation

arXiv 2024

2024

HunyuanVideo: A Systematic Framework For Large Video Generative Models

arXiv 2024

2024

Aligning Large Language Models with Representation Editing: A Control Perspective

arXiv 2024

2024

Neural Network Diffusion

arXiv 2024

2024

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

arXiv 2024

2024

Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis

arXiv 2024

2024

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

arXiv 2024

2024

ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement

arXiv 2024

2024

Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality

arXiv 2024

2024

MLLMs-Augmented Visual-Language Representation Learning

arXiv 2023

2023

InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning

arXiv 2023

2023

Dataset Quantization

ICCV 2023 1

2023

DREAM: Efficient Dataset Distillation by Representative Matching

ICCV 2023 1

2023

Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing

dynamic-prompt-learning-addressing-cross

2023

Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models

ICCV 2023 1

2023

Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching

arXiv 2023

2023

MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID

CVPR 2023 1

2023

Bioformer: an efficient transformer language model for biomedical text mining

arXiv 2023

2023

DVPT: Dynamic Visual Prompt Tuning of Large Pre-trained Models for Medical Image Analysis

arXiv 2023

2023

Exploring the Reversal Curse and Other Deductive Logical Reasoning in BERT and GPT-Based Large Language Models

arXiv 2023

2023

Versatile Diffusion: Text, Images and Variations All in One Diffusion Model

ICCV 2023 1

2022

Expanding Small-Scale Datasets with Guided Imagination

expanding-small-scale-datasets-with-guided

2022

Multi-Domain Dialogue Acts and Response Co-Generation

multi-domain-dialogue-acts-and-response-co-1

2020

Affiliations

Currently at

TIGER-Lab

researcher · university lab

Frequent co-authors

10

from 49 papers