Zhenye Gan
- Papers
- 11
Cite
Notes
Only stored in your browser.
Authored papers
11YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection
arXiv 2025
UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer
ICCV 2025
Soul: Breathe Life into Digital Human for High-fidelity Long-term Multimodal Animation
arXiv 2025
Transform Trained Transformer: Accelerating Naive 4K Video Generation Over 10times
arXiv 2025
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
CVPR 2025 1
Efficient Multimodal Large Language Models: A Survey
arXiv 2024
MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection
arXiv 2024
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
arXiv 2024
LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description
arXiv 2024
A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection
arXiv 2024
A Survey on Benchmarks of Multimodal Large Language Models
arXiv 2024
Affiliations
Frequent co-authors
10from 11 papers