Modular RDMA Interface
☆94Mar 18, 2026Updated this week
Alternatives and similar repositories for mori
Users that are interested in mori are comparing it to the libraries listed below
Sorting:
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆181Mar 13, 2026Updated last week
- AI Tensor Engine for ROCm☆381Mar 13, 2026Updated last week
- Machine Intelligence Shader Autogen. AMDGPU ML shader code generator. (previously iGEMMgen)☆37Jul 30, 2025Updated 7 months ago
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆18Dec 19, 2024Updated last year
- Primus-SaFE(Stability and Fault Endurance)☆53Updated this week
- ☆30Mar 2, 2026Updated 2 weeks ago
- Prototyp MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism☆27Apr 4, 2025Updated 11 months ago
- Open source version of DOCA GPUNetIO and DOCA Verbs libraries (limited features) to enable GDAKI technology on RDMA (IB and RoCE)☆33Mar 12, 2026Updated last week
- ☆50Mar 5, 2024Updated 2 years ago
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆38Aug 29, 2025Updated 6 months ago
- AiTer Optimized Model☆45Updated this week
- Benchmarks to capture important workloads.☆32Mar 6, 2026Updated 2 weeks ago
- An opinionated must read papers on Distributed Systems☆31Oct 11, 2022Updated 3 years ago
- ☆11Jun 29, 2021Updated 4 years ago
- ☆10Jun 28, 2025Updated 8 months ago
- RPCNIC: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator [HPCA2025]☆14Dec 9, 2024Updated last year
- ☆97Mar 13, 2026Updated last week
- Python library to add support for embedding natural code in Python with shared program state.☆24Jan 20, 2026Updated 2 months ago
- ☆48Mar 10, 2026Updated last week
- A Low-Overhead tool for Floating-Point Exception Detection in NVIDIA GPUs☆12Dec 17, 2024Updated last year
- NVIDIA Networking NIC Configuration Operator For Kubernetes☆15Updated this week
- ☆17Oct 22, 2020Updated 5 years ago
- Optimal Transport and Optimization related experiments.☆10Jul 22, 2018Updated 7 years ago
- Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocatio…☆93Sep 11, 2025Updated 6 months ago
- Memory Topology for GPUs☆19Mar 4, 2026Updated 2 weeks ago
- Implementation of ADMM-based sparse CNN architecture.☆12Aug 30, 2017Updated 8 years ago
- ☆11Apr 23, 2020Updated 5 years ago
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 7 months ago
- A lightweight design for computation-communication overlap.☆225Jan 20, 2026Updated 2 months ago
- Distributed Compiler based on Triton for Parallel Systems☆1,386Mar 11, 2026Updated last week
- ☆17Feb 12, 2025Updated last year
- ☆169Feb 5, 2026Updated last month
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆146Mar 10, 2026Updated last week
- amdgpu example code in hip/asm☆56Mar 2, 2026Updated 2 weeks ago
- ☆42Nov 5, 2024Updated last year
- ☆64Updated this week
- A deep model for speech recognition via Keras(front_end) and TensorFlow(back_end).☆12Feb 16, 2023Updated 3 years ago
- Spatial Transformer Network (STN) provides attention to a particular region to in an image, by doing transformation to the input image. T…☆15Dec 21, 2020Updated 5 years ago
- Phoenix dataplane system service☆55Feb 3, 2026Updated last month