microsoft / ark
A GPU-driven system framework for scalable AI applications
☆103Updated this week
Related projects: ⓘ
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆233Updated this week
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆55Updated 4 months ago
- Microsoft Collective Communication Library☆42Updated 4 months ago
- An experimental CPU backend for Triton☆36Updated last week
- An experimental parallel training platform☆46Updated 5 months ago
- NCCL Profiling Kit☆104Updated 2 months ago
- Synthesizer for optimal collective communication algorithms☆94Updated 5 months ago
- Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“☆54Updated 3 months ago
- Dynamic Memory Management for Serving LLMs without PagedAttention☆186Updated last month
- A low-latency & high-throughput serving engine for LLMs☆174Updated last week
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆70Updated last month
- Microsoft Collective Communication Library☆304Updated 11 months ago
- ☆72Updated last year
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆89Updated last week
- A fast communication-overlapping library for tensor parallelism on GPUs.☆184Updated this week
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆188Updated 3 weeks ago
- High performance Transformer implementation in C++.☆67Updated this week
- A validation and profiling tool for AI infrastructure☆252Updated this week
- ☆127Updated last month
- Shared Middle-Layer for Triton Compilation☆160Updated this week
- GVProf: A Value Profiler for GPU-based Clusters☆46Updated 5 months ago
- ☆33Updated 2 weeks ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆92Updated last year
- An interference-aware scheduler for fine-grained GPU sharing☆92Updated 4 months ago
- An IR for efficiently simulating distributed ML computation.☆24Updated 8 months ago
- TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.☆114Updated last week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆56Updated 3 weeks ago
- Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.☆66Updated last year
- ☆33Updated 3 weeks ago
- OpenAI Triton backend for Intel® GPUs☆126Updated this week