Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, allGather, reduceScatter and sendRecv operations.
☆35Aug 28, 2023Updated 2 years ago
Alternatives and similar repositories for NCCL
Users that are interested in NCCL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- NCCL Examples from Official NVIDIA NCCL Developer Guide.☆20May 29, 2018Updated 7 years ago
- CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API☆36Sep 15, 2023Updated 2 years ago
- Tutorials for NVIDIA CUPTI samples☆61Nov 3, 2025Updated 5 months ago
- [CF ’20] Verified Instruction-Level Energy Consumption Measurement for NVIDIA GPUs☆15Dec 11, 2020Updated 5 years ago
- Pure Rust implementation of the post-quantum secure digital signature scheme FAEST☆19Apr 6, 2026Updated last week
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A basic repository for a Clang-based tool, with CMake integration.☆10Sep 22, 2023Updated 2 years ago
- Combined solution from Matter Labs and Yrrid based on their respective submissions for the Z-Prize category Accelerating MSM Operations o…☆16Oct 30, 2023Updated 2 years ago
- TLLM_QMM strips the implementation of quantized kernels of Nvidia's TensorRT-LLM, removing NVInfer dependency and exposes ease of use Pyt…☆16Jul 5, 2024Updated last year
- A simple tool to profile performance of multiple combinations of GEMM of cuBLAS☆25Feb 9, 2021Updated 5 years ago
- ☆166Dec 27, 2024Updated last year
- OPHELib is an optimized library for partially homomorphic encryption. It currently provides an implementation of the Paillier encryption …☆15May 29, 2019Updated 6 years ago
- Datalog Engines OPtimization Tester.☆13Jan 18, 2024Updated 2 years ago
- Knowledge-Augmented Language Models for Cause-Effect Relation Classification https://arxiv.org/abs/2112.08615☆14Jun 14, 2023Updated 2 years ago
- ☆26Feb 17, 2025Updated last year
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆16Aug 20, 2024Updated last year
- RISCV Core written in Calyx☆17Aug 16, 2024Updated last year
- ☆12Jul 9, 2021Updated 4 years ago
- a QEMU + gem5 co-simulation framework for AMD MI300X GPU research.☆43Updated this week
- A Easy-to-understand TensorOp Matmul Tutorial☆423Mar 5, 2026Updated last month
- ☆27Jan 8, 2024Updated 2 years ago
- Automated testing for XML XPath execution☆18Jan 5, 2024Updated 2 years ago
- MISO: Exploiting Multi-Instance GPU Capability on Multi-Tenant GPU Clusters☆21Apr 21, 2023Updated 2 years ago
- Distributed k-nearest Neighbors using Locality Sensitive Hashing and SYCL☆10Jun 7, 2021Updated 4 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Surrogate-based Hyperparameter Tuning System☆30Jun 29, 2023Updated 2 years ago
- ☆21Sep 22, 2024Updated last year
- Collection of validation scripts, notebooks, results☆11Dec 23, 2025Updated 3 months ago
- CSCI 3753 - Operating Systems, Spring 2015☆12Feb 28, 2021Updated 5 years ago
- NCCL Tests☆1,485Mar 11, 2026Updated last month
- A repository for sharing D3.js plugins.☆12Jun 4, 2015Updated 10 years ago
- VASim is a virtual homogeneous non-deterministic finite automata automata simulator and transformation tool. VASim can parse, transform, …☆36May 17, 2024Updated last year
- The code to reproduce CVPR 2021 paper "Towards Robust Classification Model by Counterfactual and Invariant Data Generation"☆16Jul 29, 2021Updated 4 years ago
- Auto-differentiation library for C++☆12Jan 16, 2022Updated 4 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- iGniter, an interference-aware GPU resource provisioning framework for achieving predictable performance of DNN inference in the cloud.☆39Jun 11, 2024Updated last year
- Benchmarks for python☆27Jun 6, 2025Updated 10 months ago
- Sample Codes using NVSHMEM on Multi-GPU☆30Jan 22, 2023Updated 3 years ago
- Fortran bindings to the C++ Standard Library.☆34Apr 7, 2025Updated last year
- A depletion framework for OpenMC☆15Nov 27, 2017Updated 8 years ago
- OpenTracing example☆14Aug 19, 2024Updated last year
- PFI: Prompt Flow Integrity to Prevent Privilege Escalation in LLM Agents☆27Mar 26, 2025Updated last year