MyCaffe / NCCLLinks
Windows version of NVIDIA's NCCL ('Nickel') for multi-GPU training - please use https://github.com/NVIDIA/nccl for changes.
☆61Updated 2 months ago
Alternatives and similar repositories for NCCL
Users that are interested in NCCL are comparing it to the libraries listed below
Sorting:
- AMD's graph optimization engine.☆275Updated this week
- A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface☆139Updated last month
- MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into …☆208Updated this week
- ☆137Updated last week
- cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it☆681Updated this week
- Header-only safetensors loader and saver in C++☆78Updated last month
- ONNX Runtime: cross-platform, high performance scoring engine for ML models☆78Updated this week
- An easy way to run, test, benchmark and tune OpenCL kernel files☆24Updated 2 years ago
- Fork of https://source.codeaurora.org/quic/hexagon_nn/nnlib☆58Updated 2 years ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆85Updated last year
- ☆125Updated 2 years ago
- A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)☆448Updated this week
- a c++/cuda template library for tensor lazy evaluation☆164Updated 2 years ago
- Universal cross-platform tokenizers binding to HF and sentencepiece☆451Updated 2 weeks ago
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆507Updated last week
- ☆64Updated last year
- kmeans clustering with multi-GPU capabilities☆122Updated 2 years ago
- Computation using data flow graphs for scalable machine learning☆68Updated this week
- ☆62Updated 3 years ago
- Common utilities for ONNX converters☆294Updated last month
- Conversion to/from half-precision floating point formats☆379Updated 5 months ago
- AI-related samples made available by the DevTech ProViz team☆33Updated last year
- C99/C++ header-only library for division via fixed-point multiplication by inverse☆59Updated last year
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆254Updated last week
- A Toolkit to Help Optimize Large Onnx Model☆163Updated 3 months ago
- ☆172Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆114Updated this week
- Standalone Flash Attention v2 kernel without libtorch dependency☆114Updated last year
- AMD ROCm Performance Primitives (RPP) library is a comprehensive high-performance computer vision library for AMD processors with HIP/Ope…☆70Updated this week
- Ahead of Time (AOT) Triton Math Library☆88Updated last week