Sehoon Kim

Papers: 13

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

13papers

Authored papers

ETS: Efficient Tree Search for Inference-Time Scaling

arXiv 2025

2025

LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

arXiv 2024

2024

Squeezed Attention: Accelerating Long Context Length LLM Inference

arXiv 2024

2024

Efficient and Scalable Estimation of Tool Representations in Vector Space

arXiv 2024

2024

TinyAgent: Function Calling at the Edge

arXiv 2024

2024

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

arXiv 2024

2024

An LLM Compiler for Parallel Function Calling

arXiv 2023

2023

SqueezeLLM: Dense-and-Sparse Quantization

arXiv 2023

2023

Speculative Decoding with Big Little Decoder

speculative-decoding-with-big-little-decoder

2023

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

arXiv 2022

2022

I-BERT: Integer-only BERT Quantization

arXiv 2021

2021

Learned Token Pruning for Transformers

arXiv 2021

2021

Hessian-Aware Pruning and Optimal Neural Implant

arXiv 2021

2021

Affiliations

No known affiliations.

Frequent co-authors

from 13 papers

Amir Gholami

Kurt Keutzer

Michael W. Mahoney

Coleman Hooper

Nicholas Lee

Suhong Moon

Karttikeya Mangalam

Sheng Shen

Gopala Anumanchipalli

2 shared papers

Hiva Mohammadzadeh

2 shared papers