intel / intel-extension-for-deepspeed
Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note XPU is already supported in stock DeepSpeed (upstream).
☆62Updated last month
Alternatives and similar repositories for intel-extension-for-deepspeed:
Users that are interested in intel-extension-for-deepspeed are comparing it to the libraries listed below
- oneCCL Bindings for Pytorch*☆94Updated 2 weeks ago
- ☆38Updated this week
- OpenAI Triton backend for Intel® GPUs☆179Updated last week
- oneAPI Collective Communications Library (oneCCL)☆232Updated 3 weeks ago
- CUDA Templates for Linear Algebra Subroutines☆20Updated last week
- ☆60Updated 4 months ago
- ☆29Updated this week
- RCCL Performance Benchmark Tests☆64Updated last week
- Microsoft Collective Communication Library☆65Updated 5 months ago
- Synthesizer for optimal collective communication algorithms☆105Updated last year
- Intel® Tensor Processing Primitives extension for Pytorch*☆14Updated this week
- ☆68Updated 3 weeks ago
- ☆46Updated last week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆40Updated last month
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆341Updated this week
- NCCL Profiling Kit☆130Updated 9 months ago
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆136Updated this week
- ☆78Updated 5 months ago
- ☆22Updated 2 months ago
- Benchmarks to capture important workloads.☆31Updated 2 months ago
- Development repository for the Triton language and compiler☆118Updated this week
- Multi-GPU communication profiler and visualizer☆28Updated 10 months ago
- AI Tensor Engine for ROCm☆180Updated this week
- A tool for bandwidth measurements on NVIDIA GPUs.☆409Updated last week
- Ahead of Time (AOT) Triton Math Library☆57Updated last week
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆76Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆67Updated this week
- An extension library of WMMA API (Tensor Core API)☆96Updated 9 months ago
- ☆16Updated this week
- ROCm Communication Collectives Library (RCCL)☆326Updated this week