Efficient Heterogeneous Parallel Inference System for LLM on resource-constrained devices.
May 13, 2024
Dynamic Sequence Parallelism for multi-dimensional transformers.
Jan 1, 2024
Accepted by SC '23. Efficient Pipeline Parallelism System for LLM.
Nov 11, 2023