Development repository for the Triton language and compiler
☆141Feb 27, 2026Updated last week
Alternatives and similar repositories for triton
Users that are interested in triton are comparing it to the libraries listed below
Sorting:
- Fast and memory-efficient exact attention☆221Feb 26, 2026Updated last week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆113Feb 27, 2026Updated last week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror☆523Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆139Feb 27, 2026Updated last week
- Ahead of Time (AOT) Triton Math Library☆93Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆114Feb 27, 2026Updated last week
- ☆169Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆26Feb 26, 2026Updated last week
- ☆65Updated this week
- CMake modules used within the ROCm libraries☆73Feb 23, 2026Updated last week
- This is the AMD-maintained fork of the LLVM git repository. This repository accepts pull requests and issues related to AMD fork-specific…☆207Updated this week
- 8-bit CUDA functions for PyTorch☆70Sep 24, 2025Updated 5 months ago
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆17Feb 9, 2026Updated 3 weeks ago
- A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch☆25Feb 26, 2026Updated last week
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆84Feb 11, 2026Updated 3 weeks ago
- AI Tensor Engine for ROCm☆360Updated this week
- python package of rocm-smi-lib☆24Dec 15, 2025Updated 2 months ago
- Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more☆25Updated this week
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆12Jun 24, 2024Updated last year
- AMD's graph optimization engine.☆280Updated this week
- 8-bit CUDA functions for PyTorch Rocm compatible☆41Mar 26, 2024Updated last year
- ☆54Mar 15, 2025Updated 11 months ago
- The AMD rocAL is designed to efficiently decode and process images and videos from a variety of storage formats and modify them through a…☆23Feb 27, 2026Updated last week
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆165Feb 16, 2026Updated 2 weeks ago
- Hackable and optimized Transformers building blocks, supporting a composable construction.☆34Feb 26, 2026Updated last week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆148Jan 27, 2026Updated last month
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆94Feb 27, 2026Updated last week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆390Feb 24, 2026Updated last week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆256Feb 27, 2026Updated last week
- Bandwidth test for ROCm☆79Feb 23, 2026Updated last week
- ☆20Oct 11, 2023Updated 2 years ago
- Shared Middle-Layer for Triton Compilation☆329Dec 5, 2025Updated 3 months ago
- ☆38Updated this week
- extensible collectives library in triton☆96Mar 31, 2025Updated 11 months ago
- FlagGems is an operator library for large language models implemented in the Triton Language.☆909Updated this week
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆154Jan 21, 2026Updated last month
- Framework to reduce autotune overhead to zero for well known deployments.☆97Sep 19, 2025Updated 5 months ago
- ☆23Feb 24, 2026Updated last week
- ☆23Feb 17, 2026Updated 2 weeks ago