Narrow bit-width data formats are key to reducing the computational and storage costs of modern deep learning applications. This paper evaluates Microscaling (MX) data formats that combine a per-block scaling factor with narrow floating-point and integer types for individual elements. MX formats balance the competing needs of hardware efficiency, model accuracy, and user friction. Empirical results on over two dozen benchmarks demonstrate practicality of MX data formats as a drop-in replacement for baseline FP32 for AI inference and training with low user friction. We also show the first instance of training generative language models at sub-8-bit weights, activations, and gradients with minimal accuracy loss and no modifications to the training recipe.
Microscaling Data Formats for Deep Learning
Microscaling data formats reduce computational and storage costs while maintaining model accuracy and enabling sub-8-bit training for generative language models.
- Year
- 2023
- Venue
- arXiv 2023
- Authors
- 33
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2310.10537v3ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
33Bita Darvish RouhaniEric ChungRasoul ShafipourVenmugil ElangoMaxim NaumovRitchie ZhaoAnkit MoreMathew HallAlireza KhodamoradiSummer DengDhruv ChoudharyMarius CorneaEric DellingerKristof DenolfStosic DusanMaximilian GolubAlexander HeineckePhil James-RoxbyDharmesh JaniGaurav KolheMartin LanghammerAda LiLevi MelnickMaral MesmakhosroshahiAndres RodriguezMichael SchulteLei ShaoMichael SiuPradeep DubeyPaulius MicikeviciusColin VerrilliRalph WittigDoug Burger