MyCaffe / NCCL
Windows version of NVIDIA's NCCL ('Nickel') for multi-GPU training - please use https://github.com/NVIDIA/nccl for changes.
☆60Updated last year
Alternatives and similar repositories for NCCL:
Users that are interested in NCCL are comparing it to the libraries listed below
- ONNX Runtime: cross-platform, high performance scoring engine for ML models☆61Updated this week
- An easy way to run, test, benchmark and tune OpenCL kernel files☆23Updated last year
- AMD's graph optimization engine.☆215Updated this week
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆82Updated 2 years ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆84Updated last year
- Fast and memory-efficient exact attention☆62Updated last week
- Example of using pytorch's open device registration API☆29Updated 2 years ago
- ☆124Updated last year
- Benchmark code for the "Online normalizer calculation for softmax" paper☆91Updated 6 years ago
- Training material for Nsight developer tools☆156Updated 8 months ago
- CVFusion is an open-source deep learning compiler to fuse the OpenCV operators.☆29Updated 2 years ago
- Fork of https://source.codeaurora.org/quic/hexagon_nn/nnlib☆57Updated 2 years ago
- ☆68Updated 3 weeks ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated last month
- ☆20Updated 4 years ago
- MegEngine到其他框架的转换器☆69Updated last year
- ☆29Updated this week
- OneFlow->ONNX☆43Updated 2 years ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆108Updated 7 months ago
- oneCCL Bindings for Pytorch*☆94Updated 2 weeks ago
- ☆69Updated 2 years ago
- ☆96Updated 3 years ago
- Machine Intelligence Shader Autogen. AMDGPU ML shader code generator. (previously iGEMMgen)☆34Updated this week
- ☆60Updated last year
- ☆18Updated last year
- flexible-gemm conv of deepcore☆17Updated 5 years ago
- Computation using data flow graphs for scalable machine learning☆67Updated this week
- ☆28Updated 2 months ago
- Ahead of Time (AOT) Triton Math Library☆57Updated last week
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆130Updated last year