0

Amir Gholami

Papers
22

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
22papers

Authored papers

22

Residual Context Diffusion Language Models

arXiv 2026

2026

CDLM: Consistency Diffusion Language Models For Faster Sampling

arXiv 2025

2025

Arbitrage: Efficient Reasoning via Advantage-Aware Speculation

arXiv 2025

2025

ETS: Efficient Tree Search for Inference-Time Scaling

arXiv 2025

2025

LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

arXiv 2024

2024

Squeezed Attention: Accelerating Long Context Length LLM Inference

arXiv 2024

2024

Efficient and Scalable Estimation of Tool Representations in Vector Space

arXiv 2024

2024

TinyAgent: Function Calling at the Edge

arXiv 2024

2024

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

arXiv 2024

2024

Speculative Decoding with Big Little Decoder

speculative-decoding-with-big-little-decoder

2023

An LLM Compiler for Parallel Function Calling

arXiv 2023

2023

SqueezeLLM: Dense-and-Sparse Quantization

arXiv 2023

2023

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

arXiv 2022

2022

I-BERT: Integer-only BERT Quantization

arXiv 2021

2021

Learned Token Pruning for Transformers

arXiv 2021

2021

Hessian-Aware Pruning and Optimal Neural Implant

arXiv 2021

2021

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

arXiv 2020

2020

ZeroQ: A Novel Zero Shot Quantization Framework

zeroq-a-novel-zero-shot-quantization-1

2020

PowerNorm: Rethinking Batch Normalization in Transformers

ICML 2020 1

2020

HAWQV3: Dyadic Neural Network Quantization

arXiv 2020

2020

HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision

hawq-hessian-aware-quantization-of-neural

2019

HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks

NeurIPS 2020 12

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 22 papers