NVIDIA-DOCA/gpunetio

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NVIDIA-DOCA/gpunetio)

NVIDIA-DOCA / gpunetio

Open source version of DOCA GPUNetIO and DOCA Verbs libraries (limited features) to enable GDAKI technology on RDMA (IB and RoCE)

☆64

Alternatives and similar repositories for gpunetio

Users that are interested in gpunetio are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NVIDIA-DOCA / doca-samples
View on GitHub
☆56Jul 15, 2026Updated last week
NVIDIA / nvshmem
View on GitHub
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…
☆565Updated this week
leepoly / sm-profiler
View on GitHub
☆83Feb 5, 2026Updated 5 months ago
google / nccl-plugin-gpudirecttcpx
View on GitHub
☆19May 8, 2026Updated 2 months ago
SJTU-IPADS / MetaAttention
View on GitHub
MetaAttention: A Unified and Performant Attention Framework Across Hardware Backends(PPoPP'26)
☆16Dec 31, 2025Updated 6 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Mellanox / nv_peer_memory
View on GitHub
☆399Apr 23, 2024Updated 2 years ago
dsl-learn / cuda-magic
View on GitHub
fake CUTLASS to get peformance
☆26Apr 28, 2026Updated 2 months ago
oliverYoung2001 / UltraAttn
View on GitHub
SC'25 UltraAttn: Efficiently Parallelizing Attention through Hierarchical Context-Tiling
☆16Aug 14, 2025Updated 11 months ago
lemyx / tilelang-dsa
View on GitHub
DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang
☆47Nov 19, 2025Updated 8 months ago
ai-dynamo / nixl
View on GitHub
NVIDIA Inference Xfer Library (NIXL)
☆1,153Updated this week
ROCm / rocSHMEM
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆146Updated this week
KuangjuX / NVSHMEM-Tutorial
View on GitHub
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆195Feb 11, 2026Updated 5 months ago
NVIDIA / hoti-2025-gpu-comms-tutorial
View on GitHub
Tutorial Exercises and Code for GPU Communications Tutorial at HOT Interconnects 2025
☆32Oct 22, 2025Updated 9 months ago
KuangjuX / AttnLink
View on GitHub
An experimental communicating attention kernel based on DeepEP.
☆34Jul 29, 2025Updated 11 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Mellanox / nccl-rdma-sharp-plugins
View on GitHub
RDMA and SHARP plugins for nccl library
☆233Apr 3, 2026Updated 3 months ago
NVIDIA / nccl-extensions
View on GitHub
Communication patterns for AI, built on top of NCCL device and host APIs
☆20Updated this week
Hyaloid / AccSpMM
View on GitHub
Official implementation of Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores.
☆17Nov 13, 2025Updated 8 months ago
flashinfer-ai / cubloaty
View on GitHub
a size profiler for cuda binary
☆71Jan 15, 2026Updated 6 months ago
microsoft / NPKit
View on GitHub
NCCL Profiling Kit
☆155Jul 1, 2024Updated 2 years ago
zhuzilin / flash-attention-with-sink
View on GitHub
☆37Aug 7, 2025Updated 11 months ago
YJMSTR / flash-linear-attention
View on GitHub
FLA but cuTile
☆27Apr 17, 2026Updated 3 months ago
ademeure / DeeperGEMM
View on GitHub
DeeperGEMM: crazy optimized version
☆86May 5, 2025Updated last year
sii-research / VCCL
View on GitHub
Venus Collective Communication Library, supported by SII and Infrawaves.
☆152Jun 24, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ROCm / DeepEP
View on GitHub
☆15Jun 30, 2026Updated 3 weeks ago
madsys-dev / smart
View on GitHub
Scaling Up Memory Disaggregated Applications with SMART (ASPLOS 24): Predecessor of Mooncake TE
☆35Updated this week
aliyun / SimCCL
View on GitHub
☆42Nov 5, 2024Updated last year
NVIDIA / doca-platform
View on GitHub
DOCA Platform manages provisioning and service orchestration for Bluefield DPUs
☆92Updated this week
foundation-model-stack / vllm-triton-backend
View on GitHub
A Triton-only attention backend for vLLM
☆27Jul 14, 2026Updated last week
alibaba / elastic-rdma-drivers
View on GitHub
Official repository of Alibaba erdma drivers
☆37May 15, 2026Updated 2 months ago
mcrl / tccl
View on GitHub
Thunder Research Group's Collective Communication Library
☆53Jul 8, 2025Updated last year
rocmarchive / ROCnRDMA
View on GitHub
ROCm Driver RDMA Peer to Peer Support
☆22Mar 21, 2019Updated 7 years ago
linux-rdma / perftest
View on GitHub
Infiniband Verbs Performance Tests
☆999Jul 12, 2026Updated 2 weeks ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
inclusionAI / asystem-amem
View on GitHub
A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.
☆113Dec 17, 2025Updated 7 months ago
NVIDIA / cuEmbed
View on GitHub
CUDA Embedding Lookup Kernel Library
☆48Jun 26, 2026Updated last month
FZJ-JSC / tutorial-multi-gpu
View on GitHub
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
☆380Jun 26, 2026Updated last month
NVIDIA / CompileIQ
View on GitHub
An Optimizer for Nvidia Compilers.
☆111Jul 3, 2026Updated 3 weeks ago
ROCm / mori
View on GitHub
Modular RDMA Interface
☆158Updated this week
meta-pytorch / kraken
View on GitHub
Triton-based Symmetric Memory operators and examples
☆106May 15, 2026Updated 2 months ago
muriloboratto / NVSHEMEM
View on GitHub
Sample Codes using NVSHMEM on Multi-GPU
☆30Jan 22, 2023Updated 3 years ago