Ahead of Time (AOT) Triton Math Library
☆93Mar 3, 2026Updated this week
Alternatives and similar repositories for aotriton
Users that are interested in aotriton are comparing it to the libraries listed below
Sorting:
- ☆65Updated this week
- AI Tensor Engine for ROCm☆360Updated this week
- A Triton JIT runtime and ffi provider in C++☆32Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror☆523Updated this week
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 6 months ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆113Feb 27, 2026Updated last week
- Development repository for the Triton language and compiler☆141Feb 27, 2026Updated last week
- A PyTorch native platform for training generative AI models☆15Nov 18, 2025Updated 3 months ago
- ☆169Updated this week
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆32Feb 16, 2026Updated 2 weeks ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- Fast and memory-efficient exact attention☆221Feb 26, 2026Updated last week
- extensible collectives library in triton☆96Mar 31, 2025Updated 11 months ago
- TVMScript kernel for deformable attention☆25Dec 15, 2021Updated 4 years ago
- Framework to reduce autotune overhead to zero for well known deployments.☆97Sep 19, 2025Updated 5 months ago
- ☆17Jan 1, 2024Updated 2 years ago
- Wave: Python Domain-Specific Language for High Performance Machine Learning☆45Updated this week
- super repo for rocm libraries☆268Updated this week
- Ship correct and fast LLM kernels to PyTorch☆144Jan 14, 2026Updated last month
- Shared Middle-Layer for Triton Compilation☆329Dec 5, 2025Updated 3 months ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆19Feb 24, 2026Updated last week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆139Feb 27, 2026Updated last week
- GVProf: A Value Profiler for GPU-based Clusters☆53Mar 24, 2024Updated last year
- Distributed Compiler based on Triton for Parallel Systems☆1,371Feb 13, 2026Updated 3 weeks ago
- Primus-SaFE(Stability and Fault Endurance)☆52Updated this week
- ☆178May 7, 2025Updated 9 months ago
- study of cutlass☆22Nov 10, 2024Updated last year
- ☆30Updated this week
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆106Jun 28, 2025Updated 8 months ago
- DeeperGEMM: crazy optimized version☆74May 5, 2025Updated 10 months ago
- ☆118May 19, 2025Updated 9 months ago
- ☆23Apr 25, 2023Updated 2 years ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆26Feb 26, 2026Updated last week
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆177Feb 27, 2026Updated last week
- ☆12Mar 13, 2023Updated 2 years ago
- Pytorch routines for (Ker)nel (Mac)hines☆11Oct 10, 2025Updated 4 months ago
- A toolkit for developers to simplify the transformation of nn.Module instances. It's now corresponding to Pytorch.fx.☆13Apr 7, 2023Updated 2 years ago
- Automated bottleneck detection and solution orchestration☆19Feb 24, 2026Updated last week
- ☆23Jul 11, 2025Updated 7 months ago