Kurt Keutzer
- Papers
- 62
Cite
Notes
Only stored in your browser.
Authored papers
62Flash-KMeans: Fast and Memory-Efficient Exact K-Means
arXiv 2026
Residual Context Diffusion Language Models
arXiv 2026
V_1: Unifying Generation and Self-Verification for Parallel Reasoners
arXiv 2026
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation
arXiv 2025
Why Do Multi-Agent LLM Systems Fail?
arXiv 2025
S*: Test Time Scaling for Code Generation
arXiv 2025
Learning Adaptive Parallel Reasoning with Language Models
arXiv 2025
Angles Don't Lie: Unlocking Training-Efficient RL Through the Model's Own Signals
arXiv 2025
CDLM: Consistency Diffusion Language Models For Faster Sampling
arXiv 2025
Arbitrage: Efficient Reasoning via Advantage-Aware Speculation
arXiv 2025
Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation
arXiv 2025
SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
arXiv 2025
GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control
arXiv 2025
Improved Immiscible Diffusion: Accelerate Diffusion Training by Reducing Its Miscibility
arXiv 2025
ETS: Efficient Tree Search for Inference-Time Scaling
arXiv 2025
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
arXiv 2024
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
arXiv 2024
LLM Inference Unveiled: Survey and Roofline Model Insights
arXiv 2024
Looking Backward: Streaming Video-to-Video Translation with Feature Banks
arXiv 2024
TinyAgent: Function Calling at the Edge
arXiv 2024
Magic-Me: Identity-Specific Video Customized Diffusion
arXiv 2024
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
arXiv 2024
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
arXiv 2024
RouterBench: A Benchmark for Multi-LLM Routing System
arXiv 2024
LLoCO: Learning Long Contexts Offline
arXiv 2024
Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving
arXiv 2024
Squeezed Attention: Accelerating Long Context Length LLM Inference
arXiv 2024
Efficient and Scalable Estimation of Tool Representations in Vector Space
arXiv 2024
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
arXiv 2024
NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection
ICCV 2023 1
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation
ICCV 2025
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
arXiv 2023
An LLM Compiler for Parallel Function Calling
arXiv 2023
SqueezeLLM: Dense-and-Sparse Quantization
arXiv 2023
Q-Diffusion: Quantizing Diffusion Models
ICCV 2023 1
Speculative Decoding with Big Little Decoder
speculative-decoding-with-big-little-decoder
SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection
ICCV 2023 1
CVPR 2023 Text Guided Video Editing Competition
arXiv 2023
MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration
arXiv 2023
HallE-Control: Controlling Object Hallucination in Large Multimodal Models
arXiv 2023
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
arXiv 2022
The ArtBench Dataset: Benchmarking Generative Models with Artworks
arXiv 2022
Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning
ICCV 2023 1
Multitask Vision-Language Prompt Tuning
arXiv 2022
NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers
CVPR 2023 1
Domain-Adaptive Text Classification with Structured Knowledge from Unlabeled Data
arXiv 2022
I-BERT: Integer-only BERT Quantization
arXiv 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
arXiv 2021
Learned Token Pruning for Transformers
arXiv 2021
Hessian-Aware Pruning and Optimal Neural Implant
arXiv 2021
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
arXiv 2020
HAWQV3: Dyadic Neural Network Quantization
arXiv 2020
ZeroQ: A Novel Zero Shot Quantization Framework
zeroq-a-novel-zero-shot-quantization-1
PowerNorm: Rethinking Batch Normalization in Transformers
ICML 2020 1
Cross-Domain Sentiment Classification with Contrastive Learning and Mutual Information Maximization
arXiv 2020
CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs
arXiv 2020
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
large-batch-optimization-for-deep-learning
HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
hawq-hessian-aware-quantization-of-neural
HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks
NeurIPS 2020 12
FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search
fbnet-hardware-aware-efficient-convnet-design-1
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
arXiv 2016
DenseNet: Implementing Efficient ConvNet Descriptor Pyramids
arXiv 2014
Affiliations
Frequent co-authors
10from 62 papers