MyCaffe / NCCLLinks
Windows version of NVIDIA's NCCL ('Nickel') for multi-GPU training - please use https://github.com/NVIDIA/nccl for changes.
☆60Updated last year
Alternatives and similar repositories for NCCL
Users that are interested in NCCL are comparing it to the libraries listed below
Sorting:
- AMD's graph optimization engine.☆266Updated this week
- ONNX Runtime: cross-platform, high performance scoring engine for ML models☆74Updated this week
- MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into …☆203Updated last week
- A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface☆125Updated 3 months ago
- cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it☆642Updated this week
- ☆126Updated last week
- An easy way to run, test, benchmark and tune OpenCL kernel files☆24Updated 2 years ago
- Computation using data flow graphs for scalable machine learning☆68Updated this week
- Common utilities for ONNX converters☆284Updated 2 months ago
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆470Updated 3 weeks ago
- Header-only safetensors loader and saver in C++☆71Updated 6 months ago
- A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)☆427Updated 10 months ago
- A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!☆54Updated last week
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆85Updated last year
- ☆61Updated this week
- Fork of https://source.codeaurora.org/quic/hexagon_nn/nnlib☆58Updated 2 years ago
- BLAS-like Library Instantiation Software Framework☆155Updated this week
- Tensors and Dynamic neural networks in Python with strong GPU acceleration☆246Updated this week
- Repository for OpenVINO's extra modules☆148Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆255Updated this week
- Conversion to/from half-precision floating point formats☆374Updated 3 months ago
- A TensorFlow Extension: GPU performance tools for TensorFlow.☆26Updated 2 years ago
- AI-related samples made available by the DevTech ProViz team☆31Updated last year
- a c++/cuda template library for tensor lazy evaluation☆164Updated 2 years ago
- oneCCL Bindings for Pytorch* (deprecated)☆102Updated 2 weeks ago
- portDNN is a library implementing neural network algorithms written using SYCL☆113Updated last year
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆483Updated this week
- kmeans clustering with multi-GPU capabilities☆119Updated 2 years ago
- Model compression for ONNX☆98Updated last year
- A small OpenCL benchmark program to measure peak GPU/CPU performance.☆262Updated last week