Ziming Liu | MLSYS Researcher

🚀

Ziming Liu 子铭刘

Ph.D. Candidate

About Me

Hi, I am a third-year CS Ph.D. candidate at NUS, supervised by Prof. Yang You and work as a member of HPC-AI lab. I received my bachelor’s degree of computer science and engineering at Peking University in 2020, supervised by Prof. Tong Yang. Currently I am working for a stealth startup as a research intern on large-scale MoE model serving system. Previously I was an intern at Microsoft Research, supervised by Dr. Zhenhua Han and Dr. Yuqing Yang.

My research interests are machine learning systems and high performance computing. I have been working on various parallel techniques in large model training and inference, and I am also interested in the sparsity of deep learning models. I am looking forward to collaborations and research internship opportunities, so please feel free to contact me if you are interested in my research.

Download CV

Interests

Machine Learning System
High Performance Computing
Distributed Training & Inference
Sparse Inference & Training

Education

PhD Computer Science
National University of Singapore
Msc Artificial Intelligence
National University of Singapore
BSc Computer Science
Peking University

Experience

Research Intern
A Stealth Startup (To be announced soon) January 2025 – Present
Working on large scale MoE model serving system. Paper to be released soon.
Research Intern
Microsoft Research May 2024 – November 2024
Working on sparse inference and training of text-to-image and text-to-video models. Supervised by Dr. Zhenhua Han and Dr. Yuqing Yang. Awarded the “Stars of Tomorrow” Certificate from MSRA (top 10% intern).
Research Intern
HPC-AI Tech May 2022 – December 2022
Responsibilities include:
- Developing the efficient LLM inference system EnergonAI.
- Optimizing the implementation of ColossalAI.
Machine Learning Engineer
ByteDance September 2020 – June 2021
NLP algorithm engineer at Lark, ByteDance.

Education

PhD Computer Science
National University of Singapore January 2023 – Present
Working on Machine Learning System, supervised by Presidential Young Professor Yang You.
Msc Artificial Intelligence
National University of Singapore August 2021 – January 2023
BSc Computer Science
Peking University September 2016 – July 2020
Bacholor’s degree of computer science and engineering. Supervised by Prof. Tong Yang.

Featured Publications

Diffusion Transformer

Region-Adaptive Sampling for Diffusion Transformers

RAS, the first diffusion sampling strategy that allows for regional variability in sampling ratios, achieving up to 2x+ speedup!

Feb 14, 2025

Pipeline Parallelism

WeiPipe: Weight Pipeline Parallelism for Communication-Effective Long-Context Large Model Training

To appear on PPoPP '25. A novel pipeline parallelism that communicate model weight rather than activation under long-sequence scenarios.

Nov 10, 2024

Sequence Parallelism

WallFacer: Harnessing Multi-dimensional Ring Parallelism for Efficient Long Sequence Model Training

Efficient Sequence Parallelism System for Transformer model training.

Jun 30, 2024

Pipeline Parallelism

Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency

Accepted by SC '23. Efficient Pipeline Parallelism System for LLM.

Nov 11, 2023

Recent Publications

Ziming Liu, Yifan Yang, Chengruidong Zhang, Yiqi Zhang, Lili Qiu, Yang You, Yuqing Yang (2025). Region-Adaptive Sampling for Diffusion Transformers. Arxiv Preprint.

PDF Cite Code Project Poster

Junfeng Lin*, Ziming Liu*, Yang You, Jun Wang, Weihao Zhang, Rong Zhao (2024). WeiPipe: Weight Pipeline Parallelism for Communication-Effective Long-Context Large Model Training. To appear on PPoPP 2025. *:Equal Contribution.

Cite

Shenggan Cheng, Shengjie Lin, Lansong Diao, Hao Wu, Siyu Wang, Chang Si, Ziming Liu, Xuanlei Zhao, Jiangsu Du, Wei Lin, Yang You (2024). Concerto: Automatic Communication Optimization and Scheduling for Large-Scale Deep Learning. To appear on ASPLOS 2025.

Cite

Ziming Liu, Shaoyu Wang, Shenggan Cheng, Zhongkai Zhao, Kai Wang, Xuanlei Zhao, James Demmel, Yang You (2024). WallFacer: Harnessing Multi-dimensional Ring Parallelism for Efficient Long Sequence Model Training. Arxiv Preprint.

PDF Cite

Xuanlei Zhao, Bin Jia, Haotian Zhou, Ziming Liu, Shenggan Cheng, Yang You (2024). HeteGen: Efficient Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices. In MLSys 2024, Proceedings of Machine Learning and Systems.

PDF Cite

Xuanlei Zhao, Shenggan Cheng, Guangyang Lu, Jiarui Fang, Haotian Zhou, Bin Jia, Ziming Liu, Yang You (2024). AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference. In ICLR 2024, International Conference on Learning Representations.

PDF Cite

Xuanlei Zhao, Shenggan Cheng, Zangwei Zheng, Zheming Yang, Ziming Liu, Yang You (2024). DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers. Arxiv Preprint.

PDF Cite Project

Ziming Liu*, Shenggan Cheng*, Haotian Zhou, Yang You (2023). Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency. In SC ‘23, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. *:Equal Contribution.

PDF Cite

Shenggan Cheng, Ziming Liu, Jiangsu Du, Yang You (2023). ATP: Adaptive Tensor Parallelism for Foundation Models. Arxiv Preprint.

PDF Cite

Jiangsu Du, Ziming Liu, Jiarui Fang, Shenggui Li, Yongbin Li, Yutong Lu, Yang You (2022). EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models. Arxiv Preprint.

PDF Cite Project

Ph.D. Candidate

About Me

Experience

Research Intern

Research Intern

Research Intern

Machine Learning Engineer

Education

PhD Computer Science

Msc Artificial Intelligence

BSc Computer Science

Region-Adaptive Sampling for Diffusion Transformers

WeiPipe: Weight Pipeline Parallelism for Communication-Effective Long-Context Large Model Training

WallFacer: Harnessing Multi-dimensional Ring Parallelism for Efficient Long Sequence Model Training

Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency