ROCm / MADLinks
MAD (Model Automation and Dashboarding)
☆25Updated last week
Alternatives and similar repositories for MAD
Users that are interested in MAD are comparing it to the libraries listed below
Sorting:
- ☆48Updated this week
- RCCL Performance Benchmark Tests☆75Updated last week
- Ongoing research training transformer models at scale☆29Updated this week
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆153Updated this week
- AI Tensor Engine for ROCm☆285Updated last week
- Microsoft Collective Communication Library☆66Updated 10 months ago
- oneCCL Bindings for Pytorch*☆102Updated 2 months ago
- A tool for bandwidth measurements on NVIDIA GPUs.☆550Updated 6 months ago
- Development repository for the Triton language and compiler☆135Updated this week
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆119Updated this week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆63Updated 3 months ago
- NCCL Profiling Kit☆145Updated last year
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆421Updated last week
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆265Updated 2 months ago
- ROCm Communication Collectives Library (RCCL)☆389Updated this week
- ☆46Updated 10 months ago
- Bandwidth test for ROCm☆66Updated this week
- Instructions, Docker images, and examples for Nsight Compute and Nsight Systems☆133Updated 5 years ago
- oneAPI Collective Communications Library (oneCCL)☆245Updated 3 weeks ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆357Updated this week
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆226Updated this week
- A validation and profiling tool for AI infrastructure☆341Updated this week
- OpenAI Triton backend for Intel® GPUs☆211Updated this week
- SYCL* Templates for Linear Algebra (SYCL*TLA) - SYCL based CUTLASS implementation for Intel GPUs☆41Updated this week
- NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.☆121Updated last year
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆115Updated this week
- A hierarchical collective communications library with portable optimizations☆36Updated 10 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆108Updated this week
- This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.☆185Updated last week
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆91Updated this week