ParCIS / ChimeraLinks
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
☆67Updated 4 months ago
Alternatives and similar repositories for Chimera
Users that are interested in Chimera are comparing it to the libraries listed below
Sorting:
- ☆75Updated 4 years ago
- nnScaler: Compiling DNN models for Parallel Training☆114Updated this week
- ☆150Updated last year
- A lightweight design for computation-communication overlap.☆154Updated last month
- ☆80Updated 2 years ago
- ☆81Updated 2 months ago
- A baseline repository of Auto-Parallelism in Training Neural Networks☆144Updated 3 years ago
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆79Updated 8 months ago
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆216Updated last year
- ☆65Updated last year
- Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines☆19Updated last year
- Github mirror of trition-lang/triton repo.☆48Updated this week
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆40Updated 2 years ago
- ☆227Updated last year
- ☆102Updated 7 months ago
- Complete GPU residency for ML.☆37Updated last week
- ☆109Updated 8 months ago
- A resilient distributed training framework☆95Updated last year
- ☆96Updated 10 months ago
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections☆122Updated 3 years ago
- PyTorch bindings for CUTLASS grouped GEMM.☆106Updated 2 months ago
- DietCode Code Release☆64Updated 3 years ago
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.☆89Updated 2 years ago
- High performance Transformer implementation in C++.☆128Updated 6 months ago
- Sequence-level 1F1B schedule for LLMs.☆29Updated last month
- LLM serving cluster simulator☆108Updated last year
- AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)☆83Updated 2 years ago
- Thunder Research Group's Collective Communication Library☆39Updated 3 weeks ago
- Synthesizer for optimal collective communication algorithms☆111Updated last year
- ☆86Updated 3 years ago