RAS, the first diffusion sampling strategy that allows for regional variability in sampling ratios, achieving up to 2x+ speedup!
Feb 14, 2025
Efficient Sequence Parallelism System for Transformer model training.
Jun 30, 2024
Memory-efficient long sequence inference through automated activation chunking.
May 7, 2024
Adaptive Tensor Parallelism for efficient foundation model training.
Jan 1, 2023
An inference system designed for handling 10-100 billion parameter transformer models efficiently.
Jan 1, 2022