Zhihang Yuan
- Papers
- 12
Cite
Notes
Only stored in your browser.
Authored papers
12R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
arXiv 2025
VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate
arXiv 2025
INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats
arXiv 2025
OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting
arXiv 2025
DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation
arXiv 2025
LLM Inference Unveiled: Survey and Roofline Model Insights
arXiv 2024
QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
ICCV 2025
CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios
arXiv 2024
LiteVAR: Compressing Visual Autoregressive Modelling with Efficient Attention and Quantization
arXiv 2024
PB-LLM: Partially Binarized Large Language Models
arXiv 2023
Post-training Quantization on Diffusion Models
CVPR 2023 1
PD-Quant: Post-Training Quantization based on Prediction Difference Metric
CVPR 2023 1
Affiliations
Frequent co-authors
10from 12 papers