0

OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training

OpenDiLoCo is an open-source implementation of the DiLoCo training method for large language models, demonstrating its effectiveness in a decentralized framework across multiple continents and for billion-parameter models.

Year
2024
Venue
arXiv 2024
Authors
3
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2407.07852ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

OpenDiLoCo is an open-source implementation and replication of the Distributed Low-Communication (DiLoCo) training method for large language models. We provide a reproducible implementation of the DiLoCo experiments, offering it within a scalable, decentralized training framework using the Hivemind library. We demonstrate its effectiveness by training a model across two continents and three countries, while maintaining 90-95% compute utilization. Additionally, we conduct ablations studies focusing on the algorithm's compute efficiency, scalability in the number of workers and show that its gradients can be all-reduced using FP16 without any performance degradation. Furthermore, we scale OpenDiLoCo to 3x the size of the original work, demonstrating its effectiveness for billion parameter models.

Authors

3