ROCm/FlyDSL

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ROCm/FlyDSL)

ROCm / FlyDSL

FlyDSL is the Python front‑end of the project: Flexible LaYout DSL.

☆237

Alternatives and similar repositories for FlyDSL

Users that are interested in FlyDSL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ROCm / aiter
View on GitHub
AI Tensor Engine for ROCm
☆497Updated this week
ROCm / ATOM
View on GitHub
AiTer Optimized Model
☆141Updated this week
carlushuang / gcnasm
View on GitHub
amdgpu example code in hip/asm
☆66Jul 9, 2026Updated last week
ROCm / iris
View on GitHub
AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming
☆193Updated this week
ROCm / mori
View on GitHub
Modular RDMA Interface
☆151Updated this week
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
HazyResearch / HipKittens
View on GitHub
Fast and Furious AMD Kernels
☆444Jul 10, 2026Updated last week
NVIDIA / CompileIQ
View on GitHub
An Optimizer for Nvidia Compilers.
☆107Jul 3, 2026Updated 2 weeks ago
ROCm / rocprof-compute-viewer
View on GitHub
☆61Updated this week
ROCm / gfx950-gluon-tutorials
View on GitHub
A practical guide to high-performance gluon kernel development on AMD GFX9 GPUs.
☆38Updated this week
YJMSTR / flash-linear-attention
View on GitHub
FLA but cuTile
☆27Apr 17, 2026Updated 3 months ago
ROCm / TransformerEngine
View on GitHub
☆72Updated this week
ColfaxResearch / layout-categories
View on GitHub
This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".
☆139Sep 24, 2025Updated 9 months ago
iree-org / wave
View on GitHub
Wave: Python Domain-Specific Language for High Performance Machine Learning
☆58Jun 29, 2026Updated 3 weeks ago
ROCm / rocprof-trace-decoder
View on GitHub
☆17Apr 10, 2026Updated 3 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
uccl-project / mKernel
View on GitHub
mKernel: fast multi-node, multi-GPU fused kernels
☆251Jun 21, 2026Updated 3 weeks ago
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆1,063Updated this week
toyaix / triton-runner
View on GitHub
Multi-Level Triton Runner supporting Python, IR, PTX, AMDGCN, cubin and hasco.
☆98May 8, 2026Updated 2 months ago
AMD-AGI / GEAK
View on GitHub
Generating Efficient AI-Centric Kernels
☆121Updated this week
facebookexperimental / triton
View on GitHub
Github mirror of trition-lang/triton repo.
☆178Updated this week
ROCm / composable_kernel
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror
☆538Updated this week
ROCm / tritonBLAS
View on GitHub
A lightweight triton-based General Matrix Multiplication (GEMM) library.
☆65Jun 13, 2026Updated last month
NVIDIA / SOL-ExecBench
View on GitHub
A benchmark of real-world DL kernel problems
☆257Updated this week
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
flashinfer-ai / cutlass-viz
View on GitHub
☆65Apr 26, 2025Updated last year
ROCm / rocmProfileData
View on GitHub
☆30Jun 16, 2026Updated last month
triton-lang / Triton-to-tile-IR
View on GitHub
incubator repo for CUDA-TileIR backend
☆148Jul 10, 2026Updated last week
NVIDIA / TileGym
View on GitHub
Helpful kernel tutorials, examples and SKILLs for tile-based GPU programming
☆776Updated this week
mayankagarwals / MLSys-FlashLinfer-Contest
View on GitHub
☆48Updated this week
serdes21 / flashtile
View on GitHub
FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.
☆61Feb 6, 2026Updated 5 months ago
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,494Updated this week
NVIDIA / cuda-tile
View on GitHub
CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-base…
☆999Jul 6, 2026Updated 2 weeks ago
mit-han-lab / KernelWiki
View on GitHub
☆310Jun 9, 2026Updated last month
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
patrick-toulme / pyptx
View on GitHub
A Python DSL to write Nvidia PTX for Hopper and Blackwell in JAX and PyTorch
☆367Jul 9, 2026Updated last week
meta-pytorch / BackendBench
View on GitHub
Ship correct and fast LLM kernels to PyTorch
☆151Jan 14, 2026Updated 6 months ago
Dao-AILab / sonic-moe
View on GitHub
Accelerating MoE with IO and Tile-aware Optimizations
☆732Jul 4, 2026Updated 2 weeks ago
inclusionAI / cuLA
View on GitHub
CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.
☆534Updated this week
meta-pytorch / KernelAgent
View on GitHub
Autonomous GPU Kernel Generation & Optimization via Deep Agents
☆486Updated this week
DeepLink-org / DLCompiler
View on GitHub
triton for dsa
☆68Jul 10, 2026Updated last week
HanGuo97 / hilt
View on GitHub
☆40Dec 14, 2025Updated 7 months ago