Balancing LoRA Performance and Efficiency with Simple Shard Sharing

Parameter-Efficient Fine-Tuning (PEFT) methods, particularly Low-Rank Adaptation (LoRA), effectively reduce the number of trainable parameters in Large Language Models (LLMs). However, as model scales continue to grow, the demand for computational resources remains a significant challenge. Existing LoRA variants often struggle to strike an optimal balance between adaptability (model performance and convergence speed) and efficiency (computational overhead, memory usage, and initialization time). This paper introduces FOSSIL(Framework for Optimal Shard Sharing Integration in LoRA), a novel PEFT approach that addresses this trade-off through a simple shard-sharing mechanism. FOSSIL leverages the insight that a low-rank adaptation can be achieved by decomposing the weight matrix into multiple fragment matrices and utilizing a shared, trainable common fragment. This method constructs the low-rank update matrix through the replication of these shared, partitioned shards. We also propose a hardware-efficient and broadly applicable implementation for FOSSIL. Extensive experiments conducted on a range of tasks, alongside a systematic analysis of computational performance, demonstrate FOSSIL's superiority. The results show that FOSSIL significantly outperforms standard LoRA and its prominent variants in both model performance metrics and computational efficiency, including initialization speed and training throughput. By effectively balancing expressive power and resource utilization, FOSSIL offers a compelling solution for efficiently adapting large-scale models.

Balancing LoRA Performance and Efficiency with Simple Shard Sharing

Abstract

Authors