ROCm/aotriton

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ROCm/aotriton)

ROCm / aotriton

Ahead of Time (AOT) Triton Math Library

☆100

Alternatives and similar repositories for aotriton

Users that are interested in aotriton are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ROCm / iris
View on GitHub
AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming
☆193Updated this week
ROCm / TransformerEngine
View on GitHub
☆72Updated this week
ROCm / aiter
View on GitHub
AI Tensor Engine for ROCm
☆503Updated this week
ROCm / triton
View on GitHub
Development repository for the Triton language and compiler
☆146Updated this week
ROCm / rocMLIR
View on GitHub
☆185Updated this week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
iree-org / wave
View on GitHub
Wave: Python Domain-Specific Language for High Performance Machine Learning
☆58Jun 29, 2026Updated 3 weeks ago
ROCm / composable_kernel
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror
☆539Updated this week
ROCm / tritonBLAS
View on GitHub
A lightweight triton-based General Matrix Multiplication (GEMM) library.
☆66Updated this week
ROCm / hipBLASLt
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆114Updated this week
ROCm / rocmProfileData
View on GitHub
☆30Updated this week
ROCm / flash-attention
View on GitHub
Fast and memory-efficient exact attention
☆234Jul 16, 2026Updated last week
ROCm / rocm-libraries
View on GitHub
super repo for rocm libraries
☆390Updated this week
ROCm / gfx950-gluon-tutorials
View on GitHub
A practical guide to high-performance gluon kernel development on AMD GFX9 GPUs.
☆41Updated this week
ROCm / amd_matrix_instruction_calculator
View on GitHub
A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators
☆140Apr 10, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ROCm / FlyDSL
View on GitHub
FlyDSL is the Python front‑end of the project: Flexible LaYout DSL.
☆249Updated this week
ROCm / rocWMMA
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆140Jul 13, 2026Updated last week
ROCm / pyrsmi
View on GitHub
python package of rocm-smi-lib
☆25Dec 15, 2025Updated 7 months ago
ROCm / ATOM
View on GitHub
AiTer Optimized Model
☆144Updated this week
flagos-ai / FlagGems
View on GitHub
FlagGems is an operator library for large language models implemented in the Triton Language.
☆1,057Updated this week
vllm-project / tml-fa4
View on GitHub
FA4-based Relative Attention Kernel developed by TML and Colfax
☆17Jul 17, 2026Updated last week
YJMSTR / flash-linear-attention
View on GitHub
FLA but cuTile
☆27Apr 17, 2026Updated 3 months ago
ROCm / rocprofiler-sdk
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆30May 28, 2026Updated last month
dropbox / gemlite
View on GitHub
Fast low-bit matmul kernels in Triton
☆477Jul 15, 2026Updated last week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
AMD-AGI / GEAK
View on GitHub
Generating Efficient AI-Centric Kernels
☆131Updated this week
peichenxie / FPRev
View on GitHub
☆26May 9, 2025Updated last year
ROCm / rocSHMEM
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆146Updated this week
Oneflow-Inc / oneflow-lite
View on GitHub
☆17Jan 1, 2024Updated 2 years ago
meta-pytorch / tritonbench
View on GitHub
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆362Updated this week
ROCm / hrx-system
View on GitHub
HRX: Hip Runtime Extended
☆19Updated this week
cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
ROCm / llvm-project
View on GitHub
This is the AMD-maintained fork of the LLVM git repository. This repository accepts pull requests and issues related to AMD fork-specific…
☆225Updated this week
AMD-AGI / torchtitan-amd
View on GitHub
A PyTorch native platform for training generative AI models
☆17Jun 30, 2026Updated 3 weeks ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
meta-pytorch / BackendBench
View on GitHub
Ship correct and fast LLM kernels to PyTorch
☆151Jan 14, 2026Updated 6 months ago
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆19Nov 19, 2024Updated last year
flagos-ai / libtriton_jit
View on GitHub
A Triton JIT runtime and ffi provider in C++
☆37Updated this week
CRobeck / instrument-amdgpu-kernels
View on GitHub
LLVM/MLIR based compiler instrumentation of AMD GPU kernels
☆21Jul 13, 2025Updated last year
nod-ai / ossci-fleet
View on GitHub
The goal of the OSSCI Fleet is to provide a central mechanism to enable test automation, batch job scheduling, and developer access to a …
☆13Apr 28, 2026Updated 2 months ago
yester31 / Cutlass_EX
View on GitHub
study of cutlass
☆22Nov 10, 2024Updated last year
microsoft / TileFusion
View on GitHub
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆115Jun 28, 2025Updated last year