0

Jiangning Zhang

Papers
32

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
32papers

Authored papers

32

PixVerve: Advancing Native UHR Image Generation to 100MP with a Large-Scale High-Quality Dataset

arXiv 2026

2026

The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook

arXiv 2026

2026

Towards Customized Multimodal Role-Play

arXiv 2026

2026

L2P: Unlocking Latent Potential for Pixel Generation

arXiv 2026

2026

One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer

arXiv 2025

2025

OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing

arXiv 2025

2025

Soul: Breathe Life into Digital Human for High-fidelity Long-term Multimodal Animation

arXiv 2025

2025

Transform Trained Transformer: Accelerating Naive 4K Video Generation Over 10times

arXiv 2025

2025

DiP: Taming Diffusion Models in Pixel Space

arXiv 2025

2025

VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models

arXiv 2025

2025

Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow

arXiv 2025

2025

StrandDesigner: Towards Practical Strand Generation with Sketch Guidance

arXiv 2025

2025

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs

ICCV 2025

2025

SVFR: A Unified Framework for Generalized Video Face Restoration

CVPR 2025 1

2025

Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer

ICCV 2025

2025

Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling

arXiv 2025

2025

MobileMamba: Lightweight Multi-Receptive Visual Mamba Network

CVPR 2025 1

2024

MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection

arXiv 2024

2024

AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection

arXiv 2024

2024

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models

arXiv 2024

2024

SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation

arXiv 2024

2024

EMOv2: Pushing 5M Vision Model Frontier

arXiv 2024

2024

TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation

CVPR 2025 1

2024

CustAny: Customizing Anything from A Single Example

CVPR 2025 1

2024

Learning Multi-view Anomaly Detection

arXiv 2024

2024

A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection

arXiv 2024

2024

Point Cloud Mamba: Point Cloud Learning via State Space Model

arXiv 2024

2024

MotionMaster: Training-free Camera Motion Transfer For Video Generation

arXiv 2024

2024

LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description

arXiv 2024

2024

FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on

arXiv 2024

2024

Phasic Content Fusing Diffusion Model with Directional Distribution Consistency for Few-Shot Model Adaption

ICCV 2023 1

2023

Rethinking Mobile Block for Efficient Attention-based Models

ICCV 2023 1

2023

Affiliations

No known affiliations.

Frequent co-authors

10

from 32 papers