HazyResearch/HipKittens

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/HazyResearch/HipKittens)

HazyResearch / HipKittens

Fast and Furious AMD Kernels

☆444

Alternatives and similar repositories for HipKittens

Users that are interested in HipKittens are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

carlushuang / gcnasm
View on GitHub
amdgpu example code in hip/asm
☆66Jul 9, 2026Updated last week
ROCm / aiter
View on GitHub
AI Tensor Engine for ROCm
☆497Updated this week
ROCm / FlyDSL
View on GitHub
FlyDSL is the Python front‑end of the project: Flexible LaYout DSL.
☆237Updated this week
ROCm / iris
View on GitHub
AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming
☆193Updated this week
ROCm / ATOM
View on GitHub
AiTer Optimized Model
☆141Updated this week
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
ROCm / mori
View on GitHub
Modular RDMA Interface
☆151Updated this week
AMD-AGI / Primus
View on GitHub
A flexible and high-performance training framework designed for large-scale foundation model training on AMD GPUs
☆107Updated this week
HazyResearch / ThunderKittens
View on GitHub
Tile primitives for speedy kernels
☆3,552Jul 13, 2026Updated last week
RadeonFlow / RadeonFlow_Kernels
View on GitHub
Efficient implementation of DeepSeek Ops (Blockwise FP8 GEMM, MoE, and MLA) for AMD Instinct MI300X
☆79Feb 11, 2026Updated 5 months ago
ROCm / gfx950-gluon-tutorials
View on GitHub
A practical guide to high-performance gluon kernel development on AMD GFX9 GPUs.
☆38Updated this week
seb-v / fp32_sgemm_amd
View on GitHub
Super fast FP32 matrix multiplication on RDNA3
☆92Mar 30, 2025Updated last year
ROCm / tritonBLAS
View on GitHub
A lightweight triton-based General Matrix Multiplication (GEMM) library.
☆65Jun 13, 2026Updated last month
pytorch / helion
View on GitHub
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆910Updated this week
ScalingIntelligence / KernelBench
View on GitHub
KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)
☆1,148Mar 24, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ROCm / composable_kernel
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror
☆538Updated this week
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆1,063Updated this week
AMD-AGI / TraceLens
View on GitHub
Automating analysis from trace files
☆82Updated this week
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,494Updated this week
HazyResearch / Megakernels
View on GitHub
Kernels, of the mega variety :)
☆780May 26, 2026Updated last month
ROCm / rocprof-compute-viewer
View on GitHub
☆61Updated this week
mirage-project / mirage
View on GitHub
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
☆2,376Updated this week
HazyResearch / random_embedding
View on GitHub
☆15Jun 10, 2022Updated 4 years ago
huggingface / hf-rocm-kernels
View on GitHub
☆24May 26, 2026Updated last month
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
ROCm / rocmProfileData
View on GitHub
☆30Jun 16, 2026Updated last month
open-lm-engine / coda-kernels
View on GitHub
CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs
☆230Updated this week
facebookexperimental / triton
View on GitHub
Github mirror of trition-lang/triton repo.
☆178Updated this week
flashinfer-ai / cutlass-viz
View on GitHub
☆65Apr 26, 2025Updated last year
AMD-AGI / Primus-Turbo
View on GitHub
A high-performance acceleration library dedicated to large-scale model training on AMD GPUs
☆67Updated this week
ROCm / TransformerEngine
View on GitHub
☆72Updated this week
uccl-project / mKernel
View on GitHub
mKernel: fast multi-node, multi-GPU fused kernels
☆251Jun 21, 2026Updated last month
tile-ai / tilelang
View on GitHub
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
☆6,674Updated this week
Dao-AILab / sonic-moe
View on GitHub
Accelerating MoE with IO and Tile-aware Optimizations
☆732Jul 4, 2026Updated 2 weeks ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
NVIDIA / tilus
View on GitHub
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
☆489Jul 5, 2026Updated 2 weeks ago
Snektron / gpumode-amd-fp8-mm
View on GitHub
My submission for the GPUMODE/AMD fp8 mm challenge
☆29Jun 4, 2025Updated last year
microsoft / TileFusion
View on GitHub
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆115Jun 28, 2025Updated last year
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆5,988Updated this week
ROCm / rocSHMEM
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆146Jul 15, 2026Updated last week
facebookresearch / concurrentqa
View on GitHub
This repo contains data and code for the paper "Reasoning over Public and Private Data in Retrieval-Based Systems."
☆47Jul 18, 2024Updated 2 years ago
NVIDIA / cutlass
View on GitHub
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆10,104Updated this week