muriloboratto / NCCLView external linksLinks
Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, allGather, reduceScatter and sendRecv operations.
☆35Aug 28, 2023Updated 2 years ago
Alternatives and similar repositories for NCCL
Users that are interested in NCCL are comparing it to the libraries listed below
Sorting:
- NCCL Examples from Official NVIDIA NCCL Developer Guide.☆20May 29, 2018Updated 7 years ago
- CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API☆35Sep 15, 2023Updated 2 years ago
- Acoustic reverse-time migration using GPU card and POSIX thread based on the adaptive optimal finite-difference scheme and the hybrid abs…☆17Jan 27, 2018Updated 8 years ago
- Benchmarks☆17Apr 28, 2025Updated 9 months ago
- TLLM_QMM strips the implementation of quantized kernels of Nvidia's TensorRT-LLM, removing NVInfer dependency and exposes ease of use Pyt…☆16Jul 5, 2024Updated last year
- ☆26Feb 17, 2025Updated 11 months ago
- Benchmarks for python☆27Jun 6, 2025Updated 8 months ago
- ☆34Feb 19, 2024Updated last year
- Create and deploy virtual-experiments - co-processing computational workflows☆10Jan 28, 2026Updated 2 weeks ago
- A Easy-to-understand TensorOp Matmul Tutorial☆410Updated this week
- Information on how to set up Julia on HPC systems☆38Jul 6, 2023Updated 2 years ago
- code for benchmarking GPU performance based on cublasSgemm and cublasHgemm☆34May 20, 2022Updated 3 years ago
- 基于老年人互助养老模式的时间银行系统研究(程成)☆10Nov 18, 2014Updated 11 years ago
- ext_mpi_collectives☆11Apr 1, 2025Updated 10 months ago
- Memory Topology for GPUs☆17Dec 9, 2025Updated 2 months ago
- Fortran bindings to the C++ Standard Library.☆34Apr 7, 2025Updated 10 months ago
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆85Oct 8, 2019Updated 6 years ago
- VASim is a virtual homogeneous non-deterministic finite automata automata simulator and transformation tool. VASim can parse, transform, …☆36May 17, 2024Updated last year
- Performance Counter Reader☆11Sep 14, 2022Updated 3 years ago
- Converting the C-like language to binary or human readable SPIR-V☆20Jul 22, 2020Updated 5 years ago
- GPU based 2D elastic FWI☆11Mar 6, 2018Updated 7 years ago
- A smartphone specs API powered with the most trusted phone information website gsm arena.☆16Feb 1, 2024Updated 2 years ago
- 📦 A Command Line Tool for downloading protein structures, sequences and MSAs☆10Nov 21, 2017Updated 8 years ago
- A speicifically designed KV store for blockchain systems☆11Mar 10, 2025Updated 11 months ago
- ☆10Feb 5, 2026Updated last week
- Sequential Parameter Optimization in Python☆14Jan 12, 2026Updated last month
- derived from https://github.com/wilfredinni/python-cheatsheet☆10Nov 8, 2023Updated 2 years ago
- ☆11Feb 27, 2024Updated last year
- Least-squares Reverse Time Migration using 1D scalar wave equation. Very simple and for demonstration purposes only.☆10Sep 4, 2017Updated 8 years ago
- Github repository for "Big Data in Astrophysics" - Spring 2021☆14Apr 26, 2021Updated 4 years ago
- OpenMP offload playground☆10Nov 16, 2024Updated last year
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆42Jul 24, 2024Updated last year
- implement bert in pure c++☆37Apr 29, 2020Updated 5 years ago
- CSCI 3753 - Operating Systems, Spring 2015☆12Feb 28, 2021Updated 4 years ago
- Creates computational grids that can be used with ParallelStencil.jl or PETSc.jl☆11Mar 14, 2023Updated 2 years ago
- Modified Shepard Algorithm for Interpolation of Scattered Multivariate Data☆11May 28, 2022Updated 3 years ago
- Build tools for Open-CE☆13Nov 13, 2025Updated 3 months ago
- A Java blockchain database implementation☆10Feb 12, 2016Updated 10 years ago
- Infiniband RDMA Examples using libibverbs and librdmacm for learning purposes☆12Sep 24, 2021Updated 4 years ago