Zhihao Jia
- Papers
- 11
Cite
Notes
Only stored in your browser.
Authored papers
11On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective
arXiv 2025
Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning
arXiv 2025
SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning
arXiv 2025
CAT Pruning: Cluster-Aware Token Pruning For Text-to-Image Diffusion Models
arXiv 2025
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding
arXiv 2024
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
arXiv 2024
MagicPIG: LSH Sampling for Efficient LLM Generation
arXiv 2024
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices
arXiv 2024
Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models
arXiv 2024
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
arXiv 2024
GradSign: Model Performance Inference with Theoretical Insights
gradsign-model-performance-inference-with
Affiliations
Frequent co-authors
10from 11 papers