Xianzhi Yu
- Papers
- 8
Cite
Notes
Only stored in your browser.
8papers
Authored papers
8Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
arXiv 2025
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference
arXiv 2025
EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization
arXiv 2025
AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference
arXiv 2025
Behavioral Fingerprinting of Large Language Models
arXiv 2025
PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval
arXiv 2025
FlatQuant: Flatness Matters for LLM Quantization
arXiv 2024
FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers
arXiv 2024
Affiliations
No known affiliations.
Frequent co-authors
10from 8 papers