Concerto: Automatic Communication Optimization and Scheduling for Large-Scale Deep Learning
Oct 27, 2024ยท,,,,,,,,,ยท
0 min read
Shenggan Cheng
Shengjie Lin
Lansong Diao
Hao Wu
Siyu Wang
Chang Si
Ziming Liu
Xuanlei Zhao
Jiangsu Du
Wei Lin
Yang You
Abstract
With the exponential growth of deep learning (DL), there arises an escalating need for scalability. Despite significant advancements in communication hardware capabilities, the time consumed by communication processes remains a bottleneck during training. The existing various optimizations are coupled within parallel systems to implement specific computation-communication overlap. This approach poses challenges in terms of performance, programmability, and generality. In this paper, we introduce Concerto, a compiler framework designed to address these challenges by automatically optimizing and scheduling communication. We formulate the scheduling problem as a resource constrained project scheduling problem and use off-the-shelf solver to get the near-optimal scheduling. And use auto-decomposition to create overlap opportunity for critical (synchronous) communication. Our evaluation shows Concerto can match or outperform state-of-the-art parallel frameworks, including Megatron-LM, DeepSpeed, and Alpa, all of which include extensive hand-crafted communication optimization. Unlike previous works, Concerto decouples the parallel approach and communication optimization, then can generalize to a wide variety of parallelisms without manual optimization.
Type
Publication
ASPLOS 2025