AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming
☆181Mar 13, 2026Updated last week
Alternatives and similar repositories for iris
Users that are interested in iris are comparing it to the libraries listed below
Sorting:
- Modular RDMA Interface☆94Updated this week
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 7 months ago
- ☆64Updated this week
- AI Tensor Engine for ROCm☆381Mar 13, 2026Updated last week
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆146Mar 10, 2026Updated last week
- Github mirror of trition-lang/triton repo.☆150Updated this week
- ☆64Updated this week
- ☆65Apr 26, 2025Updated 10 months ago
- Intel® SHMEM - Device initiated shared memory based communication library☆32Nov 12, 2025Updated 4 months ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆127Nov 14, 2025Updated 4 months ago
- A PyTorch native platform for training generative AI models☆15Nov 18, 2025Updated 4 months ago
- pytorch ucc plugin☆23Jul 8, 2021Updated 4 years ago
- Distributed Compiler based on Triton for Parallel Systems☆1,386Mar 11, 2026Updated last week
- Tile primitives for speedy kernels☆3,232Updated this week
- How to use node-local MPI rank IDs to manually map MPI ranks to GPUs☆14Apr 22, 2020Updated 5 years ago
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆477Mar 10, 2026Updated last week
- ☆48Mar 10, 2026Updated last week
- LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model☆77Mar 11, 2026Updated last week
- Framework to reduce autotune overhead to zero for well known deployments.☆97Sep 19, 2025Updated 6 months ago
- Development repository for the Triton language and compiler☆143Mar 13, 2026Updated last week
- Open ABI and FFI for Machine Learning Systems☆361Updated this week
- Ahead of Time (AOT) Triton Math Library☆94Updated this week
- torchcomms: a modern PyTorch communications API☆349Updated this week
- ☆261Jul 11, 2024Updated last year
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆481Updated this week
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆149May 10, 2025Updated 10 months ago
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 7 months ago
- ☆52May 19, 2025Updated 10 months ago
- NVIDIA Inference Xfer Library (NIXL)☆929Mar 13, 2026Updated last week
- Perplexity GPU Kernels☆566Nov 7, 2025Updated 4 months ago
- A Quirky Assortment of CuTe Kernels☆861Updated this week
- Ship correct and fast LLM kernels to PyTorch☆145Jan 14, 2026Updated 2 months ago
- NUMA-aware multi-CPU multi-GPU data transfer benchmarks☆28Oct 26, 2023Updated 2 years ago
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆2,156Mar 12, 2026Updated last week
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆20Jan 24, 2025Updated last year
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆267Mar 13, 2026Updated last week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆139Mar 13, 2026Updated last week
- An MLIR-based compiler from C/C++ to AMD-Xilinx Versal AIE☆17Aug 5, 2022Updated 3 years ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆196Updated this week