EaaS, a novel serving system to enable efficient, scalable, and robust MoE deployment.
Sep 22, 2025
RAS, the first diffusion sampling strategy that allows for regional variability in sampling ratios, achieving up to 2x+ speedup!
Feb 14, 2025
NeurIPS '25. Efficient Sequence Parallelism System for Transformer model training.
Jun 30, 2024
Memory-efficient long sequence inference through automated activation chunking.
May 7, 2024
Adaptive Tensor Parallelism for efficient foundation model training.
Jan 1, 2023
An inference system designed for handling 10-100 billion parameter transformer models efficiently.
Jan 1, 2022