Mellanox / gpu_direct_rdma_accessView external linksLinks
example code for using DC QP for providing RDMA READ and WRITE operations to remote GPU memory
☆152Jul 30, 2024Updated last year
Alternatives and similar repositories for gpu_direct_rdma_access
Users that are interested in gpu_direct_rdma_access are comparing it to the libraries listed below
Sorting:
- ☆384Apr 23, 2024Updated last year
- A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology☆1,339Dec 17, 2025Updated last month
- Infiniband Verbs Performance Tests☆910Jan 11, 2026Updated last month
- ☆217Nov 23, 2025Updated 2 months ago
- NVIDIA GPU direct RDMA using SISCI API☆17Apr 8, 2018Updated 7 years ago
- NCCL Profiling Kit☆152Jul 1, 2024Updated last year
- A tutorial on RDMA based programming using code examples☆597Jan 3, 2020Updated 6 years ago
- GPUDirect example☆62Oct 19, 2021Updated 4 years ago
- RDMA and SHARP plugins for nccl library☆223Jan 12, 2026Updated last month
- https://rs3lab.github.io/SynCord/☆26Nov 23, 2022Updated 3 years ago
- Arbitrary offloads for RDMA NICs☆99Apr 25, 2022Updated 3 years ago
- NVIDIA GPUDirect Storage Driver☆331Dec 18, 2025Updated last month
- ☆21Dec 22, 2025Updated last month
- Repo for OSDI 2023 paper: "Ship your Critical Section Not Your Data: Enabling Transparent Delegation with TCLocks"☆21Nov 6, 2024Updated last year
- Unified Collective Communication Library☆291Jan 30, 2026Updated 2 weeks ago
- ☆15Oct 30, 2025Updated 3 months ago
- A practical way of learning Swizzle☆36Feb 3, 2025Updated last year
- RDMA core userspace libraries and daemons☆2,138Updated this week
- A collection of awesome researchers and papers about disaggregated memory.☆180Oct 14, 2025Updated 4 months ago
- Stateful LLM Serving☆95Mar 11, 2025Updated 11 months ago
- A lightweight C++ RDMA library for InfiniBand networks.☆208May 12, 2022Updated 3 years ago
- A TUI-based utility for real-time monitoring of InfiniBand traffic and performance metrics on the local node☆63Dec 19, 2025Updated last month
- GPUDirect Async support for IB Verbs☆135Nov 10, 2022Updated 3 years ago
- Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)☆1,573Updated this week
- Paper list of federated learning: About system design☆13Apr 13, 2022Updated 3 years ago
- GPUDirect Async suite☆17Dec 5, 2018Updated 7 years ago
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆92Jan 26, 2026Updated 2 weeks ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆27Oct 13, 2024Updated last year
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- Canvas: Isolated and Adaptive Swapping for Multi-Applications on Remote Memory☆38Apr 19, 2023Updated 2 years ago
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 6 months ago
- LineFS: Efficient SmartNIC Offload of a Distributed File System with Pipeline Parallelism☆89Dec 24, 2021Updated 4 years ago
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆14Nov 23, 2024Updated last year
- Debug print operator for cudagraph debugging☆14Aug 2, 2024Updated last year
- This is the implementation repository of our FAST'23 paper: FUSEE: A Fully Memory-Disaggregated Key-Value Store.☆60Feb 14, 2023Updated 2 years ago
- ☆47Dec 13, 2024Updated last year
- Build userspace NVMe drivers and storage applications with CUDA support☆416Dec 18, 2023Updated 2 years ago
- ☆24Jun 24, 2022Updated 3 years ago
- A Top-Down Profiler for GPU Applications☆22Feb 29, 2024Updated last year