ROCm/composable_kernel

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ROCm/composable_kernel)

ROCm / composable_kernel

[DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror

☆538

Alternatives and similar repositories for composable_kernel

Users that are interested in composable_kernel are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ROCm / aiter
View on GitHub
AI Tensor Engine for ROCm
☆497Updated this week
ROCm / flash-attention
View on GitHub
Fast and memory-efficient exact attention
☆234Updated this week
carlushuang / gcnasm
View on GitHub
amdgpu example code in hip/asm
☆66Jul 9, 2026Updated last week
ROCm / rocWMMA
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆140Jul 13, 2026Updated last week
ROCm / rocprofiler-compute
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆165May 28, 2026Updated last month
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
ROCm / Tensile
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆260Updated this week
ROCm / AMDMIGraphX
View on GitHub
AMD's graph optimization engine.
☆318Updated this week
ROCm / TransformerEngine
View on GitHub
☆72Updated this week
ROCm / aotriton
View on GitHub
Ahead of Time (AOT) Triton Math Library
☆100Jul 13, 2026Updated last week
ROCm / amd_matrix_instruction_calculator
View on GitHub
A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators
☆139Apr 10, 2026Updated 3 months ago
ROCm / triton
View on GitHub
Development repository for the Triton language and compiler
☆146Updated this week
ROCm / hipBLASLt
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆114Updated this week
ROCm / iris
View on GitHub
AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming
☆193Updated this week
ROCm / rocMLIR
View on GitHub
☆183Updated this week
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
ROCm / FlyDSL
View on GitHub
FlyDSL is the Python front‑end of the project: Flexible LaYout DSL.
☆237Updated this week
NVIDIA / cutlass
View on GitHub
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆10,104Updated this week
ROCm / rocm-libraries
View on GitHub
super repo for rocm libraries
☆389Updated this week
cloudcores / CuAssembler
View on GitHub
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆609Apr 20, 2023Updated 3 years ago
ROCm / MIOpen
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆1,190May 28, 2026Updated last month
ColfaxResearch / cutlass-kernels
View on GitHub
☆269Jul 11, 2024Updated 2 years ago
ROCm / AITemplate
View on GitHub
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…
☆12Jun 24, 2024Updated 2 years ago
ROCm / rocmProfileData
View on GitHub
☆30Jun 16, 2026Updated last month
facebookincubator / AITemplate
View on GitHub
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…
☆4,725Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
KnowingNothing / MatmulTutorial
View on GitHub
A Easy-to-understand TensorOp Matmul Tutorial
☆445Mar 5, 2026Updated 4 months ago
ROCm / rocprofiler
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆152May 28, 2026Updated last month
TiledTensor / TiledCUDA
View on GitHub
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆192Jan 28, 2025Updated last year
HazyResearch / ThunderKittens
View on GitHub
Tile primitives for speedy kernels
☆3,552Jul 13, 2026Updated last week
AMDResearch / intellikit
View on GitHub
IntelliKit is a collection of intelligent tools designed to make GPU kernel development, profiling, and validation accessible to LLMs and…
☆25Updated this week
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,344Aug 28, 2025Updated 10 months ago
microsoft / BitBLAS
View on GitHub
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
☆769Aug 6, 2025Updated 11 months ago
nod-ai / ossci-fleet
View on GitHub
The goal of the OSSCI Fleet is to provide a central mechanism to enable test automation, batch job scheduling, and developer access to a …
☆13Apr 28, 2026Updated 2 months ago
ROCm / rocm-blogs
View on GitHub
☆81Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ROCm / rccl
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆419Updated this week
ROCm / roctracer
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆83May 28, 2026Updated last month
ColfaxResearch / cfx-article-src
View on GitHub
☆192May 7, 2025Updated last year
alibaba / BladeDISC
View on GitHub
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
☆932Dec 30, 2024Updated last year
Yinghan-Li / YHs_Sample
View on GitHub
Yinghan's Code Sample
☆365Jul 25, 2022Updated 3 years ago
sunlex0717 / DissectingTensorCores
View on GitHub
☆114Apr 19, 2024Updated 2 years ago
ROCm / rocSHMEM
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆146Updated this week