HabanaAI / DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
☆12Updated last month
Alternatives and similar repositories for DeepSpeed:
Users that are interested in DeepSpeed are comparing it to the libraries listed below
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆60Updated 2 months ago
- oneCCL Bindings for Pytorch*☆88Updated last month
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆13Updated last month
- ☆60Updated last month
- ☆34Updated this week
- OpenAI Triton backend for Intel® GPUs☆165Updated this week
- RCCL Performance Benchmark Tests☆59Updated 3 weeks ago
- Large Language Model Text Generation Inference on Habana Gaudi☆31Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆344Updated this week
- Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysi…☆217Updated last month
- A high-throughput and memory-efficient inference and serving engine for LLMs☆56Updated this week
- oneAPI Collective Communications Library (oneCCL)☆222Updated 3 weeks ago
- ☆18Updated this week
- Intel® Tensor Processing Primitives extension for Pytorch*☆10Updated this week
- ☆20Updated last year
- PArallelLOOPgEneratoR: Threaded Loops Code Generation Infrastructure targeting Tensor Contraction Applications such as GEMMs, Convolution…☆18Updated last month
- ☆20Updated last month
- ☆18Updated 2 months ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆73Updated last year
- CUDA Templates for Linear Algebra Subroutines☆14Updated this week
- Provides the examples to write and build Habana custom kernels using the HabanaTools☆20Updated 2 months ago
- ROC profiler library. Profiling with perf-counters and derived metrics.☆134Updated this week
- RDC☆26Updated this week
- Bandwidth test for ROCm☆54Updated this week
- ☆43Updated this week
- Computation using data flow graphs for scalable machine learning☆67Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆78Updated this week
- OpenVINO LLM Benchmark☆11Updated last year
- An extension library of WMMA API (Tensor Core API)☆87Updated 7 months ago
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆169Updated this week