ROCm / MAD
☆18Updated this week
Alternatives and similar repositories for MAD:
Users that are interested in MAD are comparing it to the libraries listed below
- Ongoing research training transformer models at scale☆18Updated this week
- RCCL Performance Benchmark Tests☆64Updated this week
- ☆29Updated this week
- ☆20Updated 3 weeks ago
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆91Updated this week
- Bandwidth test for ROCm☆54Updated 2 weeks ago
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆136Updated this week
- AI Tensor Engine for ROCm☆180Updated this week
- LLM Inference analyzer for different hardware platforms☆62Updated 2 weeks ago
- Microsoft Collective Communication Library☆65Updated 5 months ago
- A hierarchical collective communications library with portable optimizations☆33Updated 4 months ago
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆76Updated last week
- ☆22Updated 2 months ago
- ☆102Updated last month
- ☆38Updated this week
- ☆68Updated 3 months ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated last month
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆342Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆75Updated this week
- ☆78Updated 2 years ago
- NCCL Profiling Kit☆132Updated 9 months ago
- oneCCL Bindings for Pytorch*☆94Updated 2 weeks ago
- TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)☆39Updated 2 weeks ago
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆83Updated this week
- Ahead of Time (AOT) Triton Math Library☆57Updated last week
- RDC☆28Updated this week
- Synthesizer for optimal collective communication algorithms☆105Updated last year
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆78Updated 5 months ago
- Benchmarks to capture important workloads.☆31Updated 2 months ago
- An experimental parallel training platform☆54Updated last year