MyCaffe / NCCLLinks
Windows version of NVIDIA's NCCL ('Nickel') for multi-GPU training - please use https://github.com/NVIDIA/nccl for changes.
☆60Updated last year
Alternatives and similar repositories for NCCL
Users that are interested in NCCL are comparing it to the libraries listed below
Sorting:
- AMD's graph optimization engine.☆228Updated this week
- A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface☆111Updated 3 months ago
- An easy way to run, test, benchmark and tune OpenCL kernel files☆23Updated last year
- cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it☆591Updated last week
- ONNX Runtime: cross-platform, high performance scoring engine for ML models☆65Updated this week
- Common utilities for ONNX converters☆274Updated 2 weeks ago
- MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into …☆197Updated this week
- Header-only safetensors loader and saver in C++☆63Updated 2 months ago
- A Toolkit to Help Optimize Large Onnx Model☆157Updated last year
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆413Updated 2 weeks ago
- ☆111Updated last week
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆84Updated last year
- OneFlow->ONNX☆43Updated 2 years ago
- ☆124Updated last year
- Universal cross-platform tokenizers binding to HF and sentencepiece☆359Updated 3 weeks ago
- A Toolkit to Help Optimize Onnx Model☆174Updated last week
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆83Updated 2 years ago
- CVFusion is an open-source deep learning compiler to fuse the OpenCV operators.☆30Updated 2 years ago
- Tencent NCNN with added CUDA support☆69Updated 4 years ago
- OpenCL Tutorials☆53Updated 5 years ago
- ☆40Updated 2 years ago
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆100Updated last week
- symmetric int8 gemm☆66Updated 5 years ago
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆437Updated this week
- a c++/cuda template library for tensor lazy evaluation☆161Updated 2 years ago
- Conversion to/from half-precision floating point formats☆355Updated 11 months ago
- Common libraries for PPL projects☆29Updated 4 months ago
- llm deploy project based onnx.☆42Updated 9 months ago
- PyTorch C++ API Documentation☆230Updated this week
- Fork of https://source.codeaurora.org/quic/hexagon_nn/nnlib☆58Updated 2 years ago