0

Michael W. Mahoney

Papers
22

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
22papers

Authored papers

22

Residual Context Diffusion Language Models

arXiv 2026

2026

Arbitrage: Efficient Reasoning via Advantage-Aware Speculation

arXiv 2025

2025

ETS: Efficient Tree Search for Inference-Time Scaling

arXiv 2025

2025

Chronos: Learning the Language of Time Series

arXiv 2024

2024

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

arXiv 2024

2024

LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

arXiv 2024

2024

Squeezed Attention: Accelerating Long Context Length LLM Inference

arXiv 2024

2024

An LLM Compiler for Parallel Function Calling

arXiv 2023

2023

SqueezeLLM: Dense-and-Sparse Quantization

arXiv 2023

2023

Speculative Decoding with Big Little Decoder

speculative-decoding-with-big-little-decoder

2023

Generative Modeling of Regular and Irregular Time Series Data via Koopman VAEs

arXiv 2023

2023

A Three-regime Model of Network Pruning

arXiv 2023

2023

Constrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching

arXiv 2023

2023

Learning Physical Models that Can Respect Conservation Laws

arXiv 2023

2023

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

arXiv 2022

2022

I-BERT: Integer-only BERT Quantization

arXiv 2021

2021

Hessian-Aware Pruning and Optimal Neural Implant

arXiv 2021

2021

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

arXiv 2020

2020

HAWQV3: Dyadic Neural Network Quantization

arXiv 2020

2020

ZeroQ: A Novel Zero Shot Quantization Framework

zeroq-a-novel-zero-shot-quantization-1

2020

PowerNorm: Rethinking Batch Normalization in Transformers

ICML 2020 1

2020

HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks

NeurIPS 2020 12

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 22 papers