Low-rank adapters (LoRA) enable finetuning of large models with only a small number of parameters. However, they often suffer from an ill-conditioned loss landscape, leading to difficult optimization. Prior work addresses these challenges by aligning adapter updates with full finetuning gradients via custom optimizers, but these methods lack the flexibility to accommodate new adapter architectures and are computationally expensive. We instead introduce OP-LoRA, a novel method which replaces each LoRA adapter with weights predicted by an extra MLP, which is discarded after training. This temporarily allows additional parameters during training to improve optimization, yet requires less wall time than custom optimizers and zero extra cost at inference time because the MLP is discarded. Crucially, extending OP-LoRA to other adapters is as simple as modifying the size of the prediction head for each new adapter type. We show that OP-LoRA allows the optimization to adaptively increase or decrease step size, improving performance and decreasing sensitivity to learning rate. On both small and large-scale LoRA tuning tasks, we observe consistent performance gains of OP-LoRA relative to LoRA and its variants. We achieve especially notable improvements in image generation, with OP-LoRA CMMD scores improving by up to 15 points relative to LoRA. This allows OP-LoRA to achieve the performance of LoRA with half of the inference parameters.
OP-LoRA: The Blessing of Dimensionality
Low-rank adapters (LoRA) enable finetuning of large models with only a small number of parameters. However, they often suffer from an ill-conditioned loss landscape, leading to difficult optimization.
- Preview

- Year
- 2024
- Hosting
- Full text hostedCC-BY-4.0
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2412.10362CC-BY-4.0
- TL;DR
- Semantic Scholar