Article

EaaS, a novel serving system to enable efficient, scalable, and robust MoE deployment.

Sep 22, 2025

RAS, the first diffusion sampling strategy that allows for regional variability in sampling ratios, achieving up to 2x+ speedup!

Feb 14, 2025

NeurIPS '25. Efficient Sequence Parallelism System for Transformer model training.

Jun 30, 2024

Memory-efficient long sequence inference through automated activation chunking.

May 7, 2024

Adaptive Tensor Parallelism for efficient foundation model training.

Jan 1, 2023

An inference system designed for handling 10-100 billion parameter transformer models efficiently.

Jan 1, 2022