microsoft / ark
A GPU-driven system framework for scalable AI applications
☆114Updated 3 months ago
Alternatives and similar repositories for ark:
Users that are interested in ark are comparing it to the libraries listed below
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆345Updated this week
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆34Updated last week
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆80Updated this week
- ☆26Updated last year
- An extension library of WMMA API (Tensor Core API)☆96Updated 9 months ago
- A lightweight design for computation-communication overlap.☆67Updated last week
- An experimental parallel training platform☆54Updated last year
- NCCL Profiling Kit☆133Updated 10 months ago
- Microsoft Collective Communication Library☆65Updated 5 months ago
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆58Updated last year
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆84Updated last week
- DeepSeek-V3/R1 inference performance simulator☆115Updated last month
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆78Updated 5 months ago
- ☆57Updated last week
- ☆79Updated 2 years ago
- ☆60Updated last month
- Thunder Research Group's Collective Communication Library☆36Updated last year
- ☆78Updated 6 months ago
- Synthesizer for optimal collective communication algorithms☆106Updated last year
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated 2 months ago
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …☆181Updated 3 months ago
- Microsoft Collective Communication Library☆344Updated last year
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆83Updated last week
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆138Updated this week
- Standalone Flash Attention v2 kernel without libtorch dependency☆108Updated 7 months ago
- REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU sche…☆94Updated 2 years ago
- Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“☆60Updated 11 months ago
- MLIR-based partitioning system☆82Updated this week
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆74Updated last month
- ☆25Updated 2 months ago