abcdabcd987 / libfabric-efa-demoView external linksLinks
☆77Jan 5, 2025Updated last year
Alternatives and similar repositories for libfabric-efa-demo
Users that are interested in libfabric-efa-demo are comparing it to the libraries listed below
Sorting:
- ☆19Jan 9, 2025Updated last year
- Ensō is a high-performance streaming interface for NIC-application communication.☆76Sep 4, 2025Updated 5 months ago
- RPCNIC: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator [HPCA2025]☆13Dec 9, 2024Updated last year
- A minimum demo for PyTorch distributed extension functionality for collectives.☆15Jul 29, 2024Updated last year
- The Artifact Evaluation Version of SOSP Paper #19☆52Aug 19, 2024Updated last year
- ☆55Jun 22, 2022Updated 3 years ago
- A Sparse-tensor Communication Framework for Distributed Deep Learning☆13Nov 1, 2021Updated 4 years ago
- ☆15Apr 18, 2023Updated 2 years ago
- Scaling Up Memory Disaggregated Applications with SMART☆34Apr 23, 2024Updated last year
- ☆18Nov 1, 2021Updated 4 years ago
- Minimal PyTorch implementation of TP, SP, FSDP and sharded-EMA☆31Nov 27, 2025Updated 2 months ago
- Zebin Ren and Animesh Trivedi. 2023. Performance Characterization of Modern Storage Stacks: POSIX I/O, libaio, SPDK, and io_uring. In Pro…☆13Mar 30, 2023Updated 2 years ago
- Reducing P4 Language’s Voluminosity using Higher-Level Constructs☆15Oct 15, 2022Updated 3 years ago
- 训练营训练方向项目☆27Jan 28, 2026Updated 2 weeks ago
- [ACM SoCC'22] Pisces: Efficient Federated Learning via Guided Asynchronous Training☆13Apr 28, 2025Updated 9 months ago
- Asynchronous Rust bindings for SPDK.☆17Nov 1, 2022Updated 3 years ago
- A parallel programming model for online applications with complex synchronization requirements.☆16Jun 8, 2022Updated 3 years ago
- Eurosys22' - Rolis: a software approach to efficiently replicating multi-core transactions☆17Feb 28, 2024Updated last year
- Source-to-Source Debuggable Derivatives in Pure Python☆15Jan 23, 2024Updated 2 years ago
- GPU Admin Tools. Includes Confidential Computing controls for H100, and other functionality☆65Dec 2, 2025Updated 2 months ago
- ☆18Dec 11, 2023Updated 2 years ago
- a naive static http server that solves C10K problem☆17Jan 8, 2017Updated 9 years ago
- Efficient GPU communication over multiple NICs.☆22Nov 20, 2025Updated 2 months ago
- Using FlexAttention to compute attention with different masking patterns☆47Sep 22, 2024Updated last year
- Perplexity GPU Kernels☆560Nov 7, 2025Updated 3 months ago
- ☆16Apr 22, 2025Updated 9 months ago
- Examples of usage for Mellanox HW offloads☆17Jan 18, 2022Updated 4 years ago
- Blog post☆17Feb 16, 2024Updated last year
- Final project of PKU DIP 2018 (火车票检测与识别)☆12Jan 4, 2019Updated 7 years ago
- Examples of CUDA implementations by Cutlass CuTe☆270Jul 1, 2025Updated 7 months ago
- Distributed Compiler based on Triton for Parallel Systems☆1,350Updated this week
- Code for paper: End-to-end Stochastic Optimization with Energy-based Model☆16Feb 14, 2023Updated 3 years ago
- Alkali is a MLIR-based compiler infrastructure for SmartNICs. It allows developers to write target-independent programs, with the compile…☆25Sep 28, 2025Updated 4 months ago
- ☆21Dec 22, 2025Updated last month
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆17Mar 13, 2023Updated 2 years ago
- Package of Pathways-on-Cloud utilities☆25Updated this week
- A fully serverless implementation of the ZooKeeper coordination protocol.☆23Aug 20, 2024Updated last year
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆462Feb 8, 2026Updated last week
- Xmixers: A collection of SOTA efficient token/channel mixers☆28Sep 4, 2025Updated 5 months ago