ROCm / rccl
ROCm Communication Collectives Library (RCCL)
☆268Updated this week
Related projects ⓘ
Alternatives and complementary repositories for rccl
- oneAPI Collective Communications Library (oneCCL)☆206Updated this week
- RCCL Performance Benchmark Tests☆50Updated 3 weeks ago
- Stretching GPU performance for GEMMs and tensor contractions.☆223Updated this week
- Unified Collective Communication Library☆207Updated last week
- ROCm BLAS marshalling library☆118Updated this week
- ROC profiler library. Profiling with perf-counters and derived metrics.☆130Updated this week
- AMD's graph optimization engine.☆186Updated this week
- ☆311Updated 6 months ago
- collection of benchmarks to measure basic GPU capabilities☆265Updated 5 months ago
- A tool for bandwidth measurements on NVIDIA GPUs.☆321Updated last month
- NCCL Profiling Kit☆112Updated 4 months ago
- ROCm Parallel Primitives☆162Updated this week
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆250Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆313Updated this week
- Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysi…☆202Updated last week
- STREAM, for lots of devices written in many programming models☆325Updated 2 months ago
- Microsoft Collective Communication Library☆323Updated last year
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆57Updated 2 months ago
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆99Updated this week
- Next generation BLAS implementation for ROCm platform☆346Updated this week
- Advanced Profiling and Analytics for AMD Hardware☆135Updated this week
- RAND library for HIP programming language☆111Updated this week
- rocWMMA☆91Updated this week
- This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.☆147Updated this week
- OpenAI Triton backend for Intel® GPUs☆143Updated this week
- CUDA Kernel Benchmarking Library☆519Updated this week
- A system validation and diagnostics tool for monitoring, stress testing, detecting, and troubleshooting issues impacting AMD GPUs in high…☆66Updated this week
- Next generation SPARSE implementation for ROCm platform☆116Updated this week
- Examples for HIP☆200Updated 2 weeks ago
- Assembler for NVIDIA Volta and Turing GPUs☆201Updated 2 years ago