To appear on PPoPP '25. A novel pipeline parallelism that communicate model weight rather than activation under long-sequence scenarios.
Nov 10, 2024
A compiler framework designed to address these challenges by automatically optimizing and scheduling communication
Oct 27, 2024
Efficient Heterogeneous Parallel Inference System for LLM on resource-constrained devices.
May 13, 2024
Dynamic Sequence Parallelism for multi-dimensional transformers.
Jan 1, 2024
Accepted by SC '23. Efficient Pipeline Parallelism System for LLM.
Nov 11, 2023