Publications

(2024). WallFacer: Harnessing Multi-dimensional Ring Parallelism for Efficient Long Sequence Model Training. Arxiv Preprint.
(2024). HeteGen: Efficient Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices. In MLSys 2024, Proceedings of Machine Learning and Systems.
(2024). AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference. In ICLR 2024, International Conference on Learning Representations.
(2024). DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers. Arxiv Preprint.
(2023). Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency. In SC ‘23, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis.
(2023). ATP: Adaptive Tensor Parallelism for Foundation Models. Arxiv Preprint.
(2022). EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models. Arxiv Preprint.