Amir Gholami
- Papers
- 22
Cite
Notes
Only stored in your browser.
Authored papers
22Residual Context Diffusion Language Models
arXiv 2026
CDLM: Consistency Diffusion Language Models For Faster Sampling
arXiv 2025
Arbitrage: Efficient Reasoning via Advantage-Aware Speculation
arXiv 2025
ETS: Efficient Tree Search for Inference-Time Scaling
arXiv 2025
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
arXiv 2024
Squeezed Attention: Accelerating Long Context Length LLM Inference
arXiv 2024
Efficient and Scalable Estimation of Tool Representations in Vector Space
arXiv 2024
TinyAgent: Function Calling at the Edge
arXiv 2024
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
arXiv 2024
Speculative Decoding with Big Little Decoder
speculative-decoding-with-big-little-decoder
An LLM Compiler for Parallel Function Calling
arXiv 2023
SqueezeLLM: Dense-and-Sparse Quantization
arXiv 2023
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
arXiv 2022
I-BERT: Integer-only BERT Quantization
arXiv 2021
Learned Token Pruning for Transformers
arXiv 2021
Hessian-Aware Pruning and Optimal Neural Implant
arXiv 2021
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
arXiv 2020
ZeroQ: A Novel Zero Shot Quantization Framework
zeroq-a-novel-zero-shot-quantization-1
PowerNorm: Rethinking Batch Normalization in Transformers
ICML 2020 1
HAWQV3: Dyadic Neural Network Quantization
arXiv 2020
HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
hawq-hessian-aware-quantization-of-neural
HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks
NeurIPS 2020 12
Affiliations
Frequent co-authors
10from 22 papers