Zhihao Jia

Papers: 11

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

11papers

Authored papers

On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective

arXiv 2025

2025

Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning

arXiv 2025

2025

SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning

arXiv 2025

2025

CAT Pruning: Cluster-Aware Token Pruning For Text-to-Image Diffusion Models

arXiv 2025

2025

Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding

arXiv 2024

2024

FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning

arXiv 2024

2024

MagicPIG: LSH Sampling for Efficient LLM Generation

arXiv 2024

2024

Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models

arXiv 2024

2024

SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices

arXiv 2024

2024

TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

arXiv 2024

2024

GradSign: Model Performance Inference with Theoretical Insights

gradsign-model-performance-inference-with

2021

Affiliations

No known affiliations.

Frequent co-authors

from 11 papers

Zhihao Zhang

Zhuoming Chen

Beidi Chen

Gabriele Oliaro

Avner May

Lijie Yang

Max Ryabinin

Ravi Netravali

Ruslan Svirschevski

Xupeng Miao