To appear on PPoPP '25. A novel pipeline parallelism that communicate model weight rather than activation under long-sequence scenarios.
Nov 10, 2024