sail-sg / zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
☆317Updated 2 months ago
Alternatives and similar repositories for zero-bubble-pipeline-parallelism:
Users that are interested in zero-bubble-pipeline-parallelism are comparing it to the libraries listed below
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference☆415Updated 3 weeks ago
- A fast communication-overlapping library for tensor parallelism on GPUs.☆280Updated 3 months ago
- Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers☆204Updated 5 months ago
- Dynamic Memory Management for Serving LLMs without PagedAttention☆273Updated last month
- PyTorch bindings for CUTLASS grouped GEMM.☆86Updated 3 weeks ago
- A collection of memory efficient attention operators implemented in the Triton language.☆233Updated 7 months ago
- Ring attention implementation with flash attention☆660Updated last month
- Disaggregated serving system for Large Language Models (LLMs).☆453Updated 5 months ago
- Applied AI experiments and examples for PyTorch