MyCaffe / NCCLLinks
Windows version of NVIDIA's NCCL ('Nickel') for multi-GPU training - please use https://github.com/NVIDIA/nccl for changes.
☆58Updated last year
Alternatives and similar repositories for NCCL
Users that are interested in NCCL are comparing it to the libraries listed below
Sorting:
- AMD's graph optimization engine.☆220Updated this week
- A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface☆105Updated 2 months ago
- ONNX Runtime: cross-platform, high performance scoring engine for ML models☆65Updated this week
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆84Updated last year
- Stretching GPU performance for GEMMs and tensor contractions.☆242Updated last week
- a c++/cuda template library for tensor lazy evaluation☆160Updated 2 years ago
- Header-only safetensors loader and saver in C++☆62Updated 3 weeks ago
- ☆106Updated last month
- Fork of https://source.codeaurora.org/quic/hexagon_nn/nnlib☆57Updated 2 years ago
- Computation using data flow graphs for scalable machine learning☆67Updated this week
- MegEngine到其他框架的转换器☆69Updated 2 years ago
- Universal cross-platform tokenizers binding to HF and sentencepiece☆342Updated last week
- An Open Convolutional Neural Network Framework in C++ From Scratch☆64Updated 4 years ago
- A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)☆401Updated 4 months ago
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆388Updated last week
- Benchmark code for the "Online normalizer calculation for softmax" paper☆94Updated 6 years ago
- symmetric int8 gemm☆66Updated 5 years ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆83Updated 2 years ago
- ☆36Updated 7 months ago
- CVFusion is an open-source deep learning compiler to fuse the OpenCV operators.☆29Updated 2 years ago
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆401Updated this week
- Tencent NCNN with added CUDA support☆69Updated 4 years ago
- Example of using pytorch's open device registration API☆30Updated 2 years ago
- cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it☆572Updated last week
- CUDA Kernel Benchmarking Library☆656Updated last week
- GPU Stress Test is a tool to stress the compute engine of NVIDIA Tesla GPU’s by running a BLAS matrix multiply using different data types…☆94Updated last month
- ☆124Updated last year
- A Toolkit to Help Optimize Onnx Model☆153Updated this week
- ☆23Updated 2 years ago
- ☆26Updated 2 years ago