0

Kai Zhang

Papers
70

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
70papers

Authored papers

70

Reward Prediction with Factorized World States

arXiv 2026

2026

QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks

arXiv 2026

2026

OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation

arXiv 2026

2026

P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling

arXiv 2026

2026

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

ICCV 2025

2025

LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

arXiv 2025

2025

UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Large Language Models

arXiv 2025

2025

Gaussian Mixture Flow Matching Models

arXiv 2025

2025

ARM: Adaptive Reasoning Model

arXiv 2025

2025

ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies

arXiv 2025

2025

R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search

arXiv 2025

2025

PhyX: Does Your Model Have the "Wits" for Physical Reasoning?

arXiv 2025

2025

GraphPrompter: Multi-stage Adaptive Prompt Optimization for Graph In-Context Learning

arXiv 2025

2025

pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation

arXiv 2025

2025

E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training

arXiv 2025

2025

Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge

arXiv 2025

2025

Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

arXiv 2025

2025

4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time

arXiv 2025

2025

Is Extending Modality The Right Path Towards Omni-Modality?

arXiv 2025

2025

MindBridge: Scalable and Cross-Model Knowledge Editing via Memory-Augmented Modality

arXiv 2025

2025

Route to Reason: Adaptive Routing for LLM and Reasoning Strategy Selection

arXiv 2025

2025

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

arXiv 2024

2024

TravelPlanner: A Benchmark for Real-World Planning with Language Agents

arXiv 2024

2024

Metasql: A Generate-then-Rank Framework for Natural Language to SQL Translation

arXiv 2024

2024

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

CVPR 2024 1

2024

Turbo3D: Ultra-fast Text-to-3D Generation

CVPR 2025 1

2024

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers

arXiv 2024

2024

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

arXiv 2024

2024

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

arXiv 2024

2024

Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats

arXiv 2024

2024

MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

arXiv 2024

2024

Evaluation of Retrieval-Augmented Generation: A Survey

arXiv 2024

2024

LRM-Zero: Training Large Reconstruction Models with Synthesized Data

arXiv 2024

2024

Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation

arXiv 2024

2024

Degradation Oriented and Regularized Network for Blind Depth Super-Resolution

arXiv 2024

2024

How Easily do Irrelevant Inputs Skew the Responses of Large Language Models?

arXiv 2024

2024

Deductive Beam Search: Decoding Deducible Rationale for Chain-of-Thought Reasoning

arXiv 2024

2024

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

arXiv 2024

2024

Revealing the Barriers of Language Agents in Planning

arXiv 2024

2024

MegaScenes: Scene-Level View Synthesis at Scale

arXiv 2024

2024

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

arXiv 2024

2024

UMIE: Unified Multimodal Information Extraction with Instruction Tuning

arXiv 2024

2024

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

arXiv 2024

2024

OFTSR: One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs

arXiv 2024

2024

Biomedical SAM 2: Segment Anything in Biomedical Images and Videos

arXiv 2024

2024

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

arXiv 2024

2024

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

CVPR 2024 1

2023

Deep Equilibrium Diffusion Restoration with Parallel Sampling

CVPR 2024 1

2023

Large Language Model Instruction Following: A Survey of Progresses and Challenges

arXiv 2023

2023

Denoising Diffusion Models for Plug-and-Play Image Restoration

arXiv 2023

2023

BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks

arXiv 2023

2023

MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing

NeurIPS 2023 11

2023

DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion

ICCV 2023 1

2023

ImagenHub: Standardizing the evaluation of conditional image generation models

arXiv 2023

2023

Equivariant Multi-Modality Image Fusion

CVPR 2024 1

2023

PathAsst: A Generative Foundation AI Assistant Towards Artificial General Intelligence of Pathology

arXiv 2023

2023

Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts

arXiv 2023

2023

Automatic Evaluation of Attribution by Large Language Models

arXiv 2023

2023

Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot Relation Extractors

arXiv 2023

2023

LLM-based Medical Assistant Personalization with Short- and Long-Term Memory Coordination

arXiv 2023

2023

ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge

arXiv 2023

2023

Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V

arXiv 2023

2023

NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video: Dataset, Methods and Results

arXiv 2022

2022

Unified Normalization for Accelerating and Stabilizing Transformers

arXiv 2022

2022

SwinIR: Image Restoration Using Swin Transformer

arXiv 2021

2021

Towards Flexible Blind JPEG Artifacts Removal

ICCV 2021 10

2021

Designing a Practical Degradation Model for Deep Blind Image Super-Resolution

ICCV 2021 10

2021

NeRF++: Analyzing and Improving Neural Radiance Fields

arXiv 2020

2020

Toward Convolutional Blind Denoising of Real Photographs

toward-convolutional-blind-denoising-of-real-1

2018

Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising

arXiv 2016

2016

Affiliations

No known affiliations.

Frequent co-authors

10

from 70 papers