Kai Zhang
- Papers
- 70
Cite
Notes
Only stored in your browser.
Authored papers
70Reward Prediction with Factorized World States
arXiv 2026
QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks
arXiv 2026
OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation
arXiv 2026
P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling
arXiv 2026
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
ICCV 2025
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model
arXiv 2025
UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Large Language Models
arXiv 2025
Gaussian Mixture Flow Matching Models
arXiv 2025
ARM: Adaptive Reasoning Model
arXiv 2025
ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies
arXiv 2025
R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search
arXiv 2025
PhyX: Does Your Model Have the "Wits" for Physical Reasoning?
arXiv 2025
GraphPrompter: Multi-stage Adaptive Prompt Optimization for Graph In-Context Learning
arXiv 2025
pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation
arXiv 2025
E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training
arXiv 2025
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
arXiv 2025
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
arXiv 2025
4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time
arXiv 2025
Is Extending Modality The Right Path Towards Omni-Modality?
arXiv 2025
MindBridge: Scalable and Cross-Model Knowledge Editing via Memory-Augmented Modality
arXiv 2025
Route to Reason: Adaptive Routing for LLM and Reasoning Strategy Selection
arXiv 2025
LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias
arXiv 2024
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
arXiv 2024
Metasql: A Generate-then-Rank Framework for Natural Language to SQL Translation
arXiv 2024
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
CVPR 2024 1
Turbo3D: Ultra-fast Text-to-3D Generation
CVPR 2025 1
LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers
arXiv 2024
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
arXiv 2024
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
arXiv 2024
Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats
arXiv 2024
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
arXiv 2024
Evaluation of Retrieval-Augmented Generation: A Survey
arXiv 2024
LRM-Zero: Training Large Reconstruction Models with Synthesized Data
arXiv 2024
Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation
arXiv 2024
Degradation Oriented and Regularized Network for Blind Depth Super-Resolution
arXiv 2024
How Easily do Irrelevant Inputs Skew the Responses of Large Language Models?
arXiv 2024
Deductive Beam Search: Decoding Deducible Rationale for Chain-of-Thought Reasoning
arXiv 2024
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
arXiv 2024
Revealing the Barriers of Language Agents in Planning
arXiv 2024
MegaScenes: Scene-Level View Synthesis at Scale
arXiv 2024
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
arXiv 2024
UMIE: Unified Multimodal Information Extraction with Instruction Tuning
arXiv 2024
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
arXiv 2024
OFTSR: One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs
arXiv 2024
Biomedical SAM 2: Segment Anything in Biomedical Images and Videos
arXiv 2024
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
arXiv 2024
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
CVPR 2024 1
Deep Equilibrium Diffusion Restoration with Parallel Sampling
CVPR 2024 1
Large Language Model Instruction Following: A Survey of Progresses and Challenges
arXiv 2023
Denoising Diffusion Models for Plug-and-Play Image Restoration
arXiv 2023
BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks
arXiv 2023
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
NeurIPS 2023 11
DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion
ICCV 2023 1
ImagenHub: Standardizing the evaluation of conditional image generation models
arXiv 2023
Equivariant Multi-Modality Image Fusion
CVPR 2024 1
PathAsst: A Generative Foundation AI Assistant Towards Artificial General Intelligence of Pathology
arXiv 2023
Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts
arXiv 2023
Automatic Evaluation of Attribution by Large Language Models
arXiv 2023
Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot Relation Extractors
arXiv 2023
LLM-based Medical Assistant Personalization with Short- and Long-Term Memory Coordination
arXiv 2023
ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge
arXiv 2023
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V
arXiv 2023
NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video: Dataset, Methods and Results
arXiv 2022
Unified Normalization for Accelerating and Stabilizing Transformers
arXiv 2022
SwinIR: Image Restoration Using Swin Transformer
arXiv 2021
Towards Flexible Blind JPEG Artifacts Removal
ICCV 2021 10
Designing a Practical Degradation Model for Deep Blind Image Super-Resolution
ICCV 2021 10
NeRF++: Analyzing and Improving Neural Radiance Fields
arXiv 2020
Toward Convolutional Blind Denoising of Real Photographs
toward-convolutional-blind-denoising-of-real-1
Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising
arXiv 2016
Affiliations
Frequent co-authors
10from 70 papers