0

LoRA Training in the NTK Regime has No Spurious Local Minima

Theoretical analysis of LoRA fine-tuning in the NTK regime shows that LoRA with rank greater than sqrt(N) helps eliminate spurious local minima and improves generalization.

Year
2024
Venue
arXiv 2024
Authors
3
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2402.11867v3ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Low-rank adaptation (LoRA) has become the standard approach for parameter-efficient fine-tuning of large language models (LLM), but our theoretical understanding of LoRA has been limited. In this work, we theoretically analyze LoRA fine-tuning in the neural tangent kernel (NTK) regime with $N$ data points, showing: (i) full fine-tuning (without LoRA) admits a low-rank solution of rank $r\lesssim \sqrt{N}$; (ii) using LoRA with rank $r\gtrsim \sqrt{N}$ eliminates spurious local minima, allowing gradient descent to find the low-rank solutions; (iii) the low-rank solution found using LoRA generalizes well.

Authors

3