ROCm/tritonBLAS

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ROCm/tritonBLAS)

ROCm / tritonBLAS

A lightweight triton-based General Matrix Multiplication (GEMM) library.

☆66

Alternatives and similar repositories for tritonBLAS

Users that are interested in tritonBLAS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ROCm / iris
View on GitHub
AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming
☆193Updated this week
AMDResearch / intellikit
View on GitHub
IntelliKit is a collection of intelligent tools designed to make GPU kernel development, profiling, and validation accessible to LLMs and…
☆27Updated this week
ROCm / hrx-system
View on GitHub
HRX: Hip Runtime Extended
☆19Updated this week
iree-org / wave
View on GitHub
Wave: Python Domain-Specific Language for High Performance Machine Learning
☆58Jun 29, 2026Updated 3 weeks ago
cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ROCm / pyrsmi
View on GitHub
python package of rocm-smi-lib
☆25Dec 15, 2025Updated 7 months ago
vllm-project / tml-fa4
View on GitHub
FA4-based Relative Attention Kernel developed by TML and Colfax
☆17Jul 17, 2026Updated last week
CRobeck / instrument-amdgpu-kernels
View on GitHub
LLVM/MLIR based compiler instrumentation of AMD GPU kernels
☆21Jul 13, 2025Updated last year
ROCm / rocSHMEM
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆146Updated this week
serdes21 / flashtile
View on GitHub
FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.
☆61Feb 6, 2026Updated 5 months ago
microsoft / TileFusion
View on GitHub
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆115Jun 28, 2025Updated last year
foundation-model-stack / vllm-triton-backend
View on GitHub
A Triton-only attention backend for vLLM
☆27Jul 14, 2026Updated last week
ROCm / omnistat
View on GitHub
Scale-out system monitoring
☆25Jul 17, 2026Updated last week
ColfaxResearch / layout-categories
View on GitHub
This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".
☆139Sep 24, 2025Updated 10 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ROCm / aotriton
View on GitHub
Ahead of Time (AOT) Triton Math Library
☆100Updated this week
AMDResearch / intelliperf
View on GitHub
Automated bottleneck detection and solution orchestration
☆23Feb 24, 2026Updated 5 months ago
ROCm / amd_matrix_instruction_calculator
View on GitHub
A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators
☆140Apr 10, 2026Updated 3 months ago
ShujianQian / epic-eval
View on GitHub
☆10May 15, 2024Updated 2 years ago
lbnlcomputerarch / MoSAIC-P38
View on GitHub
MoSAIC: Modular system for Acceleration Integration MoSAIC
☆10Jul 16, 2026Updated last week
kaist-ina / Trinity-AE
View on GitHub
Source code for Trinity(ASPLOS 2026)
☆25Apr 24, 2026Updated 3 months ago
ademeure / cuda-side-boost
View on GitHub
☆60Feb 24, 2026Updated 5 months ago
ROCm / FlyDSL
View on GitHub
FlyDSL is the Python front‑end of the project: Flexible LaYout DSL.
☆249Updated this week
vortexgpgpu / Volt
View on GitHub
☆18Feb 9, 2026Updated 5 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
flagos-ai / libtriton_jit
View on GitHub
A Triton JIT runtime and ffi provider in C++
☆37Updated this week
DeepLink-org / DLCompiler
View on GitHub
triton for dsa
☆68Jul 10, 2026Updated 2 weeks ago
NVlabs / SOLAR
View on GitHub
Speed of Light Analysis for ML Model Runtime
☆106Jun 10, 2026Updated last month
vdcores / vdcores
View on GitHub
Virtual Decoupled Cores: Composable Programming Framework and Runtime for Async GPUs
☆20Updated this week
open-ce / open-ce-builder
View on GitHub
Build tools for Open-CE
☆13Nov 13, 2025Updated 8 months ago
ROCm / tensorcast
View on GitHub
☆18Nov 10, 2025Updated 8 months ago
YJMSTR / flash-linear-attention
View on GitHub
FLA but cuTile
☆27Apr 17, 2026Updated 3 months ago
ROCm / TransformerEngine
View on GitHub
☆72Updated this week
HazyResearch / HipKittens
View on GitHub
Fast and Furious AMD Kernels
☆446Jul 10, 2026Updated 2 weeks ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
LeiWang1999 / Stream-k.tvm
View on GitHub
☆20Sep 28, 2024Updated last year
luongthecong123 / fp8-quant-matmul
View on GitHub
Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.
☆19Feb 9, 2026Updated 5 months ago
ROCm / rocprofiler-compute
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆165May 28, 2026Updated last month
IBM / triton-dejavu
View on GitHub
Framework to reduce autotune overhead to zero for well known deployments.
☆102Sep 19, 2025Updated 10 months ago
ROCm / rocprof-compute-viewer
View on GitHub
☆62Jul 16, 2026Updated last week
TiledTensor / TiledLower
View on GitHub
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆13Nov 23, 2024Updated last year
ROCm / ATOM
View on GitHub
AiTer Optimized Model
☆143Updated this week