MyCaffe / NCCLLinks
Windows version of NVIDIA's NCCL ('Nickel') for multi-GPU training - please use https://github.com/NVIDIA/nccl for changes.
☆60Updated last year
Alternatives and similar repositories for NCCL
Users that are interested in NCCL are comparing it to the libraries listed below
Sorting:
- AMD's graph optimization engine.☆262Updated this week
- MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into …☆202Updated last week
- A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface☆119Updated 2 months ago
- Header-only safetensors loader and saver in C++☆69Updated 5 months ago
- ☆126Updated this week
- cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it☆631Updated 2 weeks ago
- An easy way to run, test, benchmark and tune OpenCL kernel files☆24Updated 2 years ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆84Updated last year
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆460Updated last week
- Large Language Model Onnx Inference Framework☆36Updated 9 months ago
- ONNX Runtime: cross-platform, high performance scoring engine for ML models☆72Updated this week
- kmeans clustering with multi-GPU capabilities☆119Updated 2 years ago
- A TensorFlow Extension: GPU performance tools for TensorFlow.☆26Updated 2 years ago
- A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!☆53Updated this week
- ☆124Updated last year
- Computation using data flow graphs for scalable machine learning☆68Updated last week
- Common utilities for ONNX converters☆283Updated last month
- Standalone Flash Attention v2 kernel without libtorch dependency☆112Updated last year
- Common libraries for PPL projects☆29Updated 7 months ago
- a c++/cuda template library for tensor lazy evaluation☆163Updated 2 years ago
- Example of using pytorch's open device registration API☆30Updated 3 years ago
- ☆37Updated last year
- Conversion to/from half-precision floating point formats☆372Updated 2 months ago
- OpenVINO Intel NPU Compiler☆73Updated last week
- A converter for llama2.c legacy models to ncnn models.☆80Updated last year
- Learn OpenCL step by step.☆135Updated 3 years ago
- A Toolkit to Help Optimize Large Onnx Model☆161Updated last year
- Repository for OpenVINO's extra modules☆145Updated 2 weeks ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆83Updated 2 years ago
- AMD ROCm Performance Primitives (RPP) library is a comprehensive high-performance computer vision library for AMD processors with HIP/Ope…☆66Updated this week