HabanaAI / DeepSpeedLinks
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
☆13Updated last month
Alternatives and similar repositories for DeepSpeed
Users that are interested in DeepSpeed are comparing it to the libraries listed below
Sorting:
- SYCL* Templates for Linear Algebra (SYCL*TLA) - SYCL based CUTLASS implementation for Intel GPUs☆41Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆83Updated this week
- RCCL Performance Benchmark Tests☆75Updated last week
- oneCCL Bindings for Pytorch*☆102Updated 2 months ago
- oneAPI Collective Communications Library (oneCCL)☆245Updated 3 weeks ago
- Development repository for the Triton language and compiler☆135Updated this week
- ☆63Updated 10 months ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆63Updated 3 months ago
- Intel® Tensor Processing Primitives extension for Pytorch*☆17Updated 2 weeks ago
- ROCm Communication Collectives Library (RCCL)☆389Updated this week
- OpenAI Triton backend for Intel® GPUs☆211Updated last week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆357Updated this week
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆119Updated this week
- Reference models for Intel(R) Gaudi(R) AI Accelerator☆165Updated 3 weeks ago
- Bandwidth test for ROCm☆66Updated this week
- ☆59Updated this week
- ☆48Updated this week
- Provides the examples to write and build Habana custom kernels using the HabanaTools☆23Updated 6 months ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆115Updated this week
- oneAPI Level Zero Conformance & Performance test content☆57Updated this week
- ☆19Updated last week
- ☆23Updated last week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆472Updated this week
- SynapseAI Core is a reference implementation of the SynapseAI API running on Habana Gaudi☆43Updated 8 months ago
- ☆152Updated this week
- An extension library of WMMA API (Tensor Core API)☆106Updated last year
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆33Updated last month
- Experimental projects related to TensorRT☆113Updated last week
- Microsoft Collective Communication Library☆66Updated 10 months ago
- Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysi…☆247Updated last week