gpu-mode/Triton-Puzzles

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/gpu-mode/Triton-Puzzles)

gpu-mode / Triton-Puzzles

Puzzles for learning Triton

☆2,522

Alternatives and similar repositories for Triton-Puzzles

Users that are interested in Triton-Puzzles are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Deep-Learning-Profiling-Tools / triton-viz
View on GitHub
☆349Updated this week
SiriusNEO / Triton-Puzzles-Lite
View on GitHub
Puzzles for learning Triton, play it with minimal environment configuration!
☆733Mar 17, 2026Updated 4 months ago
srush / LLM-Training-Puzzles
View on GitHub
What would you do with 1000 H100s...
☆1,180Jan 10, 2024Updated 2 years ago
HazyResearch / ThunderKittens
View on GitHub
Tile primitives for speedy kernels
☆3,537Updated this week
triton-lang / triton
View on GitHub
Development repository for the Triton language and compiler
☆19,684Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
BobMcDear / attorch
View on GitHub
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆606May 13, 2026Updated 2 months ago
gpu-mode / triton-index
View on GitHub
Cataloging released Triton kernels.
☆311Sep 9, 2025Updated 10 months ago
srush / Transformer-Puzzles
View on GitHub
Puzzles for exploring transformers
☆397May 4, 2023Updated 3 years ago
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆5,962Updated this week
linkedin / Liger-Kernel
View on GitHub
Efficient Triton Kernels for LLM Training
☆6,509Updated this week
gpu-mode / lectures
View on GitHub
Material for gpu-mode lectures
☆6,314Jun 15, 2026Updated last month
srush / Tensor-Puzzles
View on GitHub
Solve puzzles. Improve your pytorch.
☆4,226Jul 15, 2024Updated 2 years ago
fla-org / flash-linear-attention
View on GitHub
🚀 Efficient implementations for emerging model architectures
☆5,341Updated this week
srush / GPU-Puzzles
View on GitHub
Solve puzzles. Learn CUDA.
☆12,324Sep 1, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
gpu-mode / resource-stream
View on GitHub
GPU programming related news and material links
☆2,226Jun 15, 2026Updated last month
NVIDIA / cutlass
View on GitHub
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆10,080Updated this week
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,489Jul 11, 2026Updated last week
srush / Autodiff-Puzzles
View on GitHub
☆507Oct 18, 2024Updated last year
mirage-project / mirage
View on GitHub
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
☆2,369Updated this week
pytorch / torchtitan
View on GitHub
A PyTorch native platform for training generative AI models
☆5,533Updated this week
tile-ai / tilelang
View on GitHub
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
☆6,642Updated this week
meta-pytorch / applied-ai
View on GitHub
Applied AI experiments and examples for PyTorch
☆322Aug 22, 2025Updated 10 months ago
dropbox / gemlite
View on GitHub
Fast low-bit matmul kernels in Triton
☆476Jun 30, 2026Updated 2 weeks ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
meta-pytorch / tritonbench
View on GitHub
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆361Updated this week
xlite-dev / LeetCUDA
View on GitHub
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
☆11,541Updated this week
flagos-ai / FlagGems
View on GitHub
FlagGems is an operator library for large language models implemented in the Triton Language.
☆1,049Updated this week
rkinas / triton-resources
View on GitHub
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
☆496Mar 10, 2025Updated last year
zhuzilin / ring-flash-attention
View on GitHub
Ring attention implementation with flash attention
☆1,037Sep 10, 2025Updated 10 months ago
pytorch / ao
View on GitHub
PyTorch native quantization and sparsity for training and inference
☆2,901Updated this week
Dao-AILab / flash-attention
View on GitHub
Fast and memory-efficient exact attention
☆24,460Updated this week
meta-pytorch / attention-gym
View on GitHub
Helpful tools and examples for working with flex-attention
☆1,205Updated this week
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆1,056Updated this week
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
ScalingIntelligence / KernelBench
View on GitHub
KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)
☆1,131Mar 24, 2026Updated 3 months ago
BBuf / how-to-optim-algorithm-in-cuda
View on GitHub
how to optimize some algorithm in cuda.
☆3,139Jul 8, 2026Updated last week
zinccat / Awesome-Triton-Kernels
View on GitHub
Collection of kernels written in Triton language
☆200Jan 27, 2026Updated 5 months ago
srush / annotated-mamba
View on GitHub
Annotated version of the Mamba paper
☆501Feb 27, 2024Updated 2 years ago
stas00 / ml-engineering
View on GitHub
Machine Learning Engineering Open Book
☆18,407Jul 9, 2026Updated last week
huggingface / nanotron
View on GitHub
Minimalistic large language model 3D-parallelism training
☆2,748May 26, 2026Updated last month
huggingface / picotron
View on GitHub
Minimalistic 4D-parallelism distributed training framework for education purpose
☆2,246Aug 26, 2025Updated 10 months ago