Ziming Liu 🚀
Ziming Liu 子铭刘

Ph.D. Candidate

About Me

Hi, I am a second-year CS Ph.D. candidate at NUS, supervised by Prof. Yang You and work as a member of HPC-AI lab. I received my bachelor’s degree of computer science and engineering at Peking University in 2020, supervised by Prof. Tong Yang. Previously I was an intern at Microsoft Research, supervised by Dr. Zhenhua Han and Dr. Yuqing Yang.

My research interest is machine learning system and high performance computing. I have been working on pipeline parallelism and sequence parallelism in deep learning training, and I am currently digging into sparse inference and training of deep learning models. I am looking forward to collaborations and research internship opportunities, so please feel free to reach out to me if you are interested in my research.

Download CV
Interests
  • Machine Learning System
  • High Performance Computing
  • Distributed Training & Inference
  • Sparse Inference & Training
Education
  • PhD Computer Science

    National University of Singapore

  • Msc Artificial Intelligence

    National University of Singapore

  • BSc Computer Science

    Peking University

Experience

  1. Research Intern

    Microsoft Research
    Working on sparse inference and training of text-to-image and text-to-video models. Supervised by Dr. Zhenhua Han and Dr. Yuqing Yang.
  2. Research Intern

    HPC-AI Tech

    Responsibilities include:

    • Developing the efficient LLM inference system EnergonAI.
    • Optimizing the implementation of ColossalAI.
  3. Machine Learning Engineer

    ByteDance
    NLP algorithm engineer at Lark, ByteDance.

Education

  1. PhD Computer Science

    National University of Singapore
    Working on Machine Learning System, supervised by Presidential Young Professor Yang You.
  2. Msc Artificial Intelligence

    National University of Singapore
  3. BSc Computer Science

    Peking University
    Bacholor’s degree of computer science and engineering. Supervised by Prof. Tong Yang.
Featured Publications
Recent Publications
(2024). WeiPipe: Weight Pipeline Parallelism for Communication-Effective Long-Context Large Model Training. To appear on PPoPP 2025. *:Equal Contribution.
(2024). Concerto: Automatic Communication Optimization and Scheduling for Large-Scale Deep Learning. To appear on ASPLOS 2025.
(2024). WallFacer: Harnessing Multi-dimensional Ring Parallelism for Efficient Long Sequence Model Training. Arxiv Preprint.
(2024). HeteGen: Efficient Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices. In MLSys 2024, Proceedings of Machine Learning and Systems.
(2024). AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference. In ICLR 2024, International Conference on Learning Representations.
(2024). DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers. Arxiv Preprint.
(2023). Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency. In SC ‘23, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. *:Equal Contribution.
(2023). ATP: Adaptive Tensor Parallelism for Foundation Models. Arxiv Preprint.
(2022). EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models. Arxiv Preprint.