Deep learning is part of a broader family of machine learning methods, based on artificial neural networks, and has allowed us to develop everything from voice and image recognition tools, enhance drug discovery and toxicology, and improve financial fraud detection.
As the applications of machine learning become bigger, more complex and increasingly ubiquitous in our modern, digital age, neural networks have grown tremendously in size, consisting of trillions of connections. To train these models faster, researchers typically distribute the training effort over many computers or Graphics Processing Units yet, just like humans that collaborate to solve a task, collaborating computers also suffer from communication overhead.
“Because the neural networks that are trained are so large, the communication required between computers to achieve an accurate model can amount to many PetaBytes. Researchers have long been trying to find ways to compress the bandwidth needed while still allowing accurate training,” said Martin Jaggi, Head of the Machine Learning and Optimization Laboratory (MLO), part of the School of Computer and Communications Sciences (IC).
New EPFL algorithm developed
PowerSGD is an algorithm developed by PhD students, Thijs Vogels and Sai Praneeth Karimireddy, who work with Professor Jaggi. Its name comes from the power method, which repeatedly multiplies a matrix by a vector, in order to capture its main directions. Here, EPFL researchers have applied it to the changes in the neural network model, allowing a drastic reduction in the communication required in distributed training. When applied to standard deep learning benchmarks such as image recognition or transformer models for text, the algorithm saves up to 99% of the communication while retaining good model accuracy.
“Machine learning models are only going to get bigger. Developing new training algorithms which can scale to such models and reduce energy requirements is a hugely important topic. In addition to PyTorch, we were happy to learn that our new algorithm has also recently been used in Open-AI’s DALL-E, which can generate creative images from text,” said EPFL’s Thijs Vogels.
PyTorch 1.8 with PowerSGD
PyTorch is an open source machine learning library, used by around 80% of academic publications using deep learning. It has launched its newest version, 1.8, containing the EPFL developed PowerSGD for the first time.
As a result, the more communication efficient training scheme - which works for any deep learning model - is now readily available to users in industry and research, who can now activate communication compression with a simple software switch.
In addition to its training benefits, the efficiency of the algorithm uses less power, helping to reduce energy use, important in the fight against climate change.
Looking ahead, the EPFL team that developed PowerSGD has recently been working to extend the principle to decentralized training as well, where agents can collaboratively train a deep learning model without the need of any central server, and without risking leaks of their data. This can be a crucial enabler for privacy sensitive applications such as in medical use-cases or with personal mobile devices.