alibaba / easydist
Automated Parallelization System and Infrastructure for Multiple Ecosystems
☆78Updated 5 months ago
Alternatives and similar repositories for easydist:
Users that are interested in easydist are comparing it to the libraries listed below
- A lightweight design for computation-communication overlap.☆35Updated last week
- ☆93Updated 7 months ago
- ☆72Updated 4 years ago
- nnScaler: Compiling DNN models for Parallel Training☆109Updated this week
- Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.☆62Updated last month
- High performance Transformer implementation in C++.☆119Updated 3 months ago
- DeeperGEMM: crazy optimized version☆68Updated this week
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆84Updated this week
- ☆59Updated 10 months ago
- ☆79Updated 2 years ago
- ☆57Updated last week
- Stateful LLM Serving☆65Updated last month
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆34Updated this week
- DeepSeek-V3/R1 inference performance simulator☆115Updated last month
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆38Updated 2 years ago
- ☆70Updated 4 months ago
- ☆60Updated last month
- A resilient distributed training framework☆95Updated last year
- ☆95Updated 5 months ago
- AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)☆81Updated last year
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections☆121Updated 2 years ago
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆74Updated last month
- Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“☆60Updated 11 months ago
- [USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paral…☆53Updated 9 months ago
- Microsoft Collective Communication Library☆65Updated 5 months ago
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …☆181Updated 3 months ago
- ☆142Updated 9 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆87Updated this week
- ☆84Updated last month
- ☆78Updated 5 months ago