Sehoon Kim
- Papers
- 13
Cite
Notes
Only stored in your browser.
Authored papers
13ETS: Efficient Tree Search for Inference-Time Scaling
arXiv 2025
TinyAgent: Function Calling at the Edge
arXiv 2024
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
arXiv 2024
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
arXiv 2024
Squeezed Attention: Accelerating Long Context Length LLM Inference
arXiv 2024
Efficient and Scalable Estimation of Tool Representations in Vector Space
arXiv 2024
An LLM Compiler for Parallel Function Calling
arXiv 2023
SqueezeLLM: Dense-and-Sparse Quantization
arXiv 2023
Speculative Decoding with Big Little Decoder
speculative-decoding-with-big-little-decoder
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
arXiv 2022
I-BERT: Integer-only BERT Quantization
arXiv 2021
Learned Token Pruning for Transformers
arXiv 2021
Hessian-Aware Pruning and Optimal Neural Implant
arXiv 2021
Affiliations
Frequent co-authors
10from 13 papers
Amir Gholami
Kurt Keutzer
Michael W. Mahoney
Coleman Hooper
Nicholas Lee
Suhong Moon
Karttikeya Mangalam
Sheng Shen
Gopala Anumanchipalli
Hiva Mohammadzadeh