0

FAS: Fast ANN-SNN Conversion for Spiking Large Language Models

A novel Fast ANN-SNN conversion strategy (FAS) transforms large language models into spiking models with improved performance, reduced inference latency, and lower energy consumption.

Year
2025
Venue
arXiv 2025
Authors
6
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2502.04405v2ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Spiking Large Language Models have been shown as a good alternative to LLMs in various scenarios. Existing methods for creating Spiking LLMs, i.e., direct training and ANN-SNN conversion, often suffer from performance degradation and relatively high computational costs. To address these issues, we propose a novel Fast ANN-SNN conversion strategy (FAS) that transforms LLMs into spiking LLMs in two stages. The first stage employs a full-parameter fine-tuning of pre-trained models, so it does not need any direct training from scratch. The second stage introduces a coarse-to-fine calibration method to reduce conversion errors and improve accuracy. Experiments on both language and vision-language tasks across four different scales of LLMs demonstrate that FAS can achieve state-of-the-art performance yet with significantly reduced inference latency and computational costs. Notably, FAS only takes eight timesteps to achieve an accuracy of 3% higher than that of the OPT-7B model, while reducing energy consumption by 96.63%. The source code is available at https://github.com/lc783/FAS

Authors

6