AI Tensor Engine for ROCm
☆356Feb 21, 2026Updated last week
Alternatives and similar repositories for aiter
Users that are interested in aiter are comparing it to the libraries listed below
Sorting:
- [DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror☆521Updated this week
- Ahead of Time (AOT) Triton Math Library☆92Updated this week
- Modular RDMA Interface☆84Updated this week
- ☆60Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆113Feb 20, 2026Updated last week
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆172Feb 21, 2026Updated last week
- ☆30Feb 11, 2026Updated 2 weeks ago
- amdgpu example code in hip/asm☆55Feb 16, 2026Updated last week
- Fast and Furious AMD Kernels☆368Feb 15, 2026Updated last week
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆36Aug 29, 2025Updated 5 months ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆127Nov 14, 2025Updated 3 months ago
- The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm☆791Updated this week
- MAD (Model Automation and Dashboarding)☆31Feb 11, 2026Updated 2 weeks ago
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 6 months ago
- Ongoing research training transformer models at scale☆37Feb 20, 2026Updated last week
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆165Feb 16, 2026Updated last week
- Fast and memory-efficient exact attention☆221Updated this week
- Development repository for the Triton language and compiler☆140Updated this week
- The C++ Standard Library for your entire system.☆26Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆139Updated this week
- Framework to reduce autotune overhead to zero for well known deployments.☆96Sep 19, 2025Updated 5 months ago
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆144Updated this week
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- Distributed Compiler based on Triton for Parallel Systems☆1,361Feb 13, 2026Updated 2 weeks ago
- ☆96Feb 18, 2026Updated last week
- MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI tr…☆52Updated this week
- super repo for rocm libraries☆259Updated this week
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.☆751Aug 6, 2025Updated 6 months ago
- AMD’s C++ library for accelerating tensor primitives☆49Feb 18, 2026Updated last week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆255Feb 10, 2026Updated 2 weeks ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆195Updated this week
- AMD's graph optimization engine.☆279Updated this week
- Ship correct and fast LLM kernels to PyTorch☆142Jan 14, 2026Updated last month
- ☆112Apr 19, 2024Updated last year
- ☆168Updated this week
- TORCH_TRACE parser for PT2☆78Feb 12, 2026Updated 2 weeks ago
- Kubernetes operator which sets up all platform tools to have a cluster ready for applications to run.☆17Feb 20, 2026Updated last week
- ☆46Feb 20, 2026Updated last week
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆411Updated this week