Concerto: Automatic Communication Optimization and Scheduling for Large-Scale Deep Learning

Oct 27, 2024ยท
Shenggan Cheng
,
Shengjie Lin
,
Lansong Diao
,
Hao Wu
,
Siyu Wang
,
Chang Si
Ziming Liu
Ziming Liu
,
Xuanlei Zhao
,
Jiangsu Du
,
Wei Lin
,
Yang You
ยท 0 min read
Abstract
With the exponential growth of deep learning (DL), there arises an escalating need for scalability. Despite significant advancements in communication hardware capabilities, the time consumed by communication processes remains a bottleneck during training. The existing various optimizations are coupled within parallel systems to implement specific computation-communication overlap. This approach poses challenges in terms of performance, programmability, and generality. In this paper, we introduce Concerto, a compiler framework designed to address these challenges by automatically optimizing and scheduling communication. We formulate the scheduling problem as a resource constrained project scheduling problem and use off-the-shelf solver to get the near-optimal scheduling. And use auto-decomposition to create overlap opportunity for critical (synchronous) communication. Our evaluation shows Concerto can match or outperform state-of-the-art parallel frameworks, including Megatron-LM, DeepSpeed, and Alpa, all of which include extensive hand-crafted communication optimization. Unlike previous works, Concerto decouples the parallel approach and communication optimization, then can generalize to a wide variety of parallelisms without manual optimization.
Type
Publication
ASPLOS 2025