ROCm / apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
☆22Updated last week
Alternatives and similar repositories for apex:
Users that are interested in apex are comparing it to the libraries listed below
- ☆29Updated this week
- CUDA Templates for Linear Algebra Subroutines☆20Updated this week
- Ahead of Time (AOT) Triton Math Library☆57Updated last week
- oneCCL Bindings for Pytorch*☆94Updated 2 weeks ago
- RCCL Performance Benchmark Tests☆64Updated this week
- ☆22Updated 2 months ago
- Development repository for the Triton language and compiler☆118Updated this week
- Bandwidth test for ROCm☆54Updated 2 weeks ago
- Fast and memory-efficient exact attention☆171Updated this week
- Benchmarks to capture important workloads.☆31Updated 2 months ago
- OpenAI Triton backend for Intel® GPUs☆182Updated this week
- Benchmark code for the "Online normalizer calculation for softmax" paper☆91Updated 6 years ago
- oneAPI Collective Communications Library (oneCCL)☆232Updated this week
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆83Updated this week
- Optimize GEMM with tensorcore step by step☆25Updated last year
- ☆60Updated 4 months ago
- ☆20Updated last month
- ☆68Updated 3 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆75Updated this week
- ☆68Updated last month
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆106Updated 9 months ago
- ROC profiler library. Profiling with perf-counters and derived metrics.☆142Updated this week
- ROCm BLAS marshalling library☆138Updated this week
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆37Updated 9 months ago
- An extension library of WMMA API (Tensor Core API)☆96Updated 9 months ago
- ☆78Updated 5 months ago
- ☆50Updated last year
- rocWMMA☆109Updated this week
- ROCm Communication Collectives Library (RCCL)☆326Updated this week
- Reference implementations of MLPerf™ HPC training benchmarks☆47Updated 2 months ago