0

Kurt Keutzer

Papers
62

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
62papers

Authored papers

62

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

arXiv 2026

2026

Residual Context Diffusion Language Models

arXiv 2026

2026

V_1: Unifying Generation and Self-Verification for Parallel Reasoners

arXiv 2026

2026

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

arXiv 2025

2025

Why Do Multi-Agent LLM Systems Fail?

arXiv 2025

2025

S*: Test Time Scaling for Code Generation

arXiv 2025

2025

Learning Adaptive Parallel Reasoning with Language Models

arXiv 2025

2025

Angles Don't Lie: Unlocking Training-Efficient RL Through the Model's Own Signals

arXiv 2025

2025

CDLM: Consistency Diffusion Language Models For Faster Sampling

arXiv 2025

2025

Arbitrage: Efficient Reasoning via Advantage-Aware Speculation

arXiv 2025

2025

Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation

arXiv 2025

2025

SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity

arXiv 2025

2025

GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control

arXiv 2025

2025

Improved Immiscible Diffusion: Accelerate Diffusion Training by Reducing Its Miscibility

arXiv 2025

2025

ETS: Efficient Tree Search for Inference-Time Scaling

arXiv 2025

2025

COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training

arXiv 2024

2024

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

arXiv 2024

2024

LLM Inference Unveiled: Survey and Roofline Model Insights

arXiv 2024

2024

Looking Backward: Streaming Video-to-Video Translation with Feature Banks

arXiv 2024

2024

TinyAgent: Function Calling at the Edge

arXiv 2024

2024

Magic-Me: Identity-Specific Video Customized Diffusion

arXiv 2024

2024

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

arXiv 2024

2024

LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

arXiv 2024

2024

RouterBench: A Benchmark for Multi-LLM Routing System

arXiv 2024

2024

LLoCO: Learning Long Contexts Offline

arXiv 2024

2024

Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving

arXiv 2024

2024

Squeezed Attention: Accelerating Long Context Length LLM Inference

arXiv 2024

2024

Efficient and Scalable Estimation of Tool Representations in Vector Space

arXiv 2024

2024

FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models

arXiv 2024

2024

NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection

ICCV 2023 1

2023

StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

ICCV 2025

2023

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

arXiv 2023

2023

An LLM Compiler for Parallel Function Calling

arXiv 2023

2023

SqueezeLLM: Dense-and-Sparse Quantization

arXiv 2023

2023

Q-Diffusion: Quantizing Diffusion Models

ICCV 2023 1

2023

Speculative Decoding with Big Little Decoder

speculative-decoding-with-big-little-decoder

2023

SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection

ICCV 2023 1

2023

CVPR 2023 Text Guided Video Editing Competition

arXiv 2023

2023

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration

arXiv 2023

2023

HallE-Control: Controlling Object Hallucination in Large Multimodal Models

arXiv 2023

2023

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

arXiv 2022

2022

The ArtBench Dataset: Benchmarking Generative Models with Artworks

arXiv 2022

2022

Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning

ICCV 2023 1

2022

Multitask Vision-Language Prompt Tuning

arXiv 2022

2022

NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers

CVPR 2023 1

2022

Domain-Adaptive Text Classification with Structured Knowledge from Unlabeled Data

arXiv 2022

2022

I-BERT: Integer-only BERT Quantization

arXiv 2021

2021

How Much Can CLIP Benefit Vision-and-Language Tasks?

arXiv 2021

2021

Learned Token Pruning for Transformers

arXiv 2021

2021

Hessian-Aware Pruning and Optimal Neural Implant

arXiv 2021

2021

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

arXiv 2020

2020

HAWQV3: Dyadic Neural Network Quantization

arXiv 2020

2020

ZeroQ: A Novel Zero Shot Quantization Framework

zeroq-a-novel-zero-shot-quantization-1

2020

PowerNorm: Rethinking Batch Normalization in Transformers

ICML 2020 1

2020

Cross-Domain Sentiment Classification with Contrastive Learning and Mutual Information Maximization

arXiv 2020

2020

CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs

arXiv 2020

2020

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

large-batch-optimization-for-deep-learning

2019

HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision

hawq-hessian-aware-quantization-of-neural

2019

HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks

NeurIPS 2020 12

2019

FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search

fbnet-hardware-aware-efficient-convnet-design-1

2018

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

arXiv 2016

2016

DenseNet: Implementing Efficient ConvNet Descriptor Pyramids

arXiv 2014

2014

Affiliations

No known affiliations.

Frequent co-authors

10

from 62 papers