FlashDecoding++: Faster Large Language Model Inference on GPUs
As the Large Language Model (LLM) becomes increasingly important in various domains. However, the following challenges still remain unsolved in accelerating LLM inference: (1) Synchronized partial softmax update.
- Year
- 2023
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.