Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, allGather, reduceScatter and sendRecv operations.
☆35Aug 28, 2023Updated 2 years ago
Alternatives and similar repositories for NCCL
Users that are interested in NCCL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- NCCL Examples from Official NVIDIA NCCL Developer Guide.☆20May 29, 2018Updated 7 years ago
- CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API☆37Sep 15, 2023Updated 2 years ago
- Tutorials for NVIDIA CUPTI samples☆67Nov 3, 2025Updated 6 months ago
- [CF ’20] Verified Instruction-Level Energy Consumption Measurement for NVIDIA GPUs☆15Dec 11, 2020Updated 5 years ago
- TLLM_QMM strips the implementation of quantized kernels of Nvidia's TensorRT-LLM, removing NVInfer dependency and exposes ease of use Pyt…☆16Jul 5, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A simple tool to profile performance of multiple combinations of GEMM of cuBLAS☆25Feb 9, 2021Updated 5 years ago
- Benchmarks☆19May 14, 2026Updated last week
- ☆167Dec 27, 2024Updated last year
- Acoustic reverse-time migration using GPU card and POSIX thread based on the adaptive optimal finite-difference scheme and the hybrid abs…☆17Jan 27, 2018Updated 8 years ago
- Accelerating MSM Operations on GPU/FPGA☆15Sep 16, 2022Updated 3 years ago
- Datalog Engines OPtimization Tester.☆13Jan 18, 2024Updated 2 years ago
- Knowledge-Augmented Language Models for Cause-Effect Relation Classification https://arxiv.org/abs/2112.08615☆14Jun 14, 2023Updated 2 years ago
- ☆26Feb 17, 2025Updated last year
- Möbius Transformation for Fast Inner Product Search on Graph☆23Jun 3, 2021Updated 4 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆13Jul 9, 2021Updated 4 years ago
- GEMM by WMMA (tensor core)☆15Jul 31, 2022Updated 3 years ago
- GPU-accelerated AES encryption project☆11Feb 13, 2015Updated 11 years ago
- ☆27Jan 8, 2024Updated 2 years ago
- Surrogate-based Hyperparameter Tuning System☆30Jun 29, 2023Updated 2 years ago
- Collection of validation scripts, notebooks, results☆11Dec 23, 2025Updated 5 months ago
- Memory footprint reduction for transformer models☆11Jan 24, 2023Updated 3 years ago
- NCCL Tests☆1,529Updated this week
- A C++11 high performance webserver,支持多线程,单线程,使用Reactor模型,仿照muduo库的one loop per thread☆12Aug 3, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- VASim is a virtual homogeneous non-deterministic finite automata automata simulator and transformation tool. VASim can parse, transform, …☆36May 17, 2024Updated 2 years ago
- Script for doing Slurm Calculations☆12Mar 21, 2025Updated last year
- Auto-differentiation library for C++☆12Jan 16, 2022Updated 4 years ago
- Pretraining summarization models using a corpus of nonsense☆13Sep 28, 2021Updated 4 years ago
- iGniter, an interference-aware GPU resource provisioning framework for achieving predictable performance of DNN inference in the cloud.☆39Jun 11, 2024Updated last year
- Pypi Fetcher for Nix with simplified interface. (contains hashes for all packages)☆15Nov 7, 2023Updated 2 years ago
- Sample Codes using NVSHMEM on Multi-GPU☆30Jan 22, 2023Updated 3 years ago
- ☆39Jul 13, 2022Updated 3 years ago
- A depletion framework for OpenMC☆15Nov 27, 2017Updated 8 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Distributed IO-aware Attention algorithm☆24Sep 24, 2025Updated 8 months ago
- An attept to a fully configured Nix environment in a Docker image.☆18Dec 4, 2019Updated 6 years ago
- Scalable radix top-k selection on GPUs.☆23Jan 27, 2025Updated last year
- Parallel implementation of a convolution filter using MPI and optionally OpenMP☆10Mar 8, 2015Updated 11 years ago
- AutodiffEngine☆13Apr 1, 2019Updated 7 years ago
- LLVM-Canon aims to transform LLVM modules into a canonical form by reordering and renaming instructions while preserving the same semanti…☆31Apr 30, 2024Updated 2 years ago
- LaTeX template for students of The University of Alabama to write their thesis or dissertation☆13Dec 4, 2018Updated 7 years ago