Torsten Hoefler
- Papers
- 18
Cite
Notes
Only stored in your browser.
Authored papers
18WUSH: Near-Optimal Adaptive Transforms for LLM Quantization
arXiv 2025
Reasoning Language Models: A Blueprint
arXiv 2025
HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs
arXiv 2025
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments
arXiv 2025
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
arXiv 2024
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models
arXiv 2024
Multi-Head RAG: Solving Multi-Aspect Problems with LLMs
arXiv 2024
High Performance Unstructured SpMM Computation Using Tensor Cores
arXiv 2024
All models are wrong, some are useful: Model Selection with Limited Labels
arXiv 2024
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
arXiv 2024
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
arXiv 2023
QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language Models
arXiv 2023
Co-design Hardware and Algorithm for Vector Search
arXiv 2023
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
arXiv 2023
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
arXiv 2022
Spatial Mixture-of-Experts
arXiv 2022
Neural Parameter Allocation Search
neural-parameter-allocation-search
Data Movement Is All You Need: A Case Study on Optimizing Transformers
arXiv 2020
Affiliations
Frequent co-authors
10from 18 papers