Efficient Sequence Parallelism System for Transformer model training.
Jun 30, 2024
Memory-efficient long sequence inference through automated activation chunking.
May 7, 2024
Adaptive Tensor Parallelism for efficient foundation model training.
Jan 1, 2023
An inference system designed for handling 10-100 billion parameter transformer models efficiently.
Jan 1, 2022