srush/Triton-Puzzles

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/srush/Triton-Puzzles)

srush / Triton-Puzzles

Puzzles for learning Triton

☆2,324

Alternatives and similar repositories for Triton-Puzzles

Users that are interested in Triton-Puzzles are comparing it to the libraries listed below

Sorting:

Deep-Learning-Profiling-Tools / triton-viz
View on GitHub
☆301Updated this week
SiriusNEO / Triton-Puzzles-Lite
View on GitHub
Puzzles for learning Triton, play it with minimal environment configuration!
☆634Dec 28, 2025Updated 2 months ago
srush / LLM-Training-Puzzles
View on GitHub
What would you do with 1000 H100s...
☆1,154Jan 10, 2024Updated 2 years ago
HazyResearch / ThunderKittens
View on GitHub
Tile primitives for speedy kernels
☆3,202Feb 24, 2026Updated last week
BobMcDear / attorch
View on GitHub
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆595Aug 12, 2025Updated 6 months ago
triton-lang / triton
View on GitHub
Development repository for the Triton language and compiler
☆18,501Updated this week
gpu-mode / triton-index
View on GitHub
Cataloging released Triton kernels.
☆295Sep 9, 2025Updated 5 months ago
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆5,057Updated this week
linkedin / Liger-Kernel
View on GitHub
Efficient Triton Kernels for LLM Training
☆6,162Feb 27, 2026Updated last week
gpu-mode / lectures
View on GitHub
Material for gpu-mode lectures
☆5,800Feb 1, 2026Updated last month
srush / Transformer-Puzzles
View on GitHub
Puzzles for exploring transformers
☆386May 4, 2023Updated 2 years ago
fla-org / flash-linear-attention
View on GitHub
🚀 Efficient implementations of state-of-the-art linear attention models
☆4,474Updated this week
srush / Tensor-Puzzles
View on GitHub
Solve puzzles. Improve your pytorch.
☆3,966Jul 15, 2024Updated last year
gpu-mode / resource-stream
View on GitHub
GPU programming related news and material links
☆2,010Sep 17, 2025Updated 5 months ago
srush / GPU-Puzzles
View on GitHub
Solve puzzles. Learn CUDA.
☆11,970Sep 1, 2024Updated last year
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,371Feb 13, 2026Updated 3 weeks ago
pytorch / torchtitan
View on GitHub
A PyTorch native platform for training generative AI models
☆5,098Updated this week
tile-ai / tilelang
View on GitHub
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
☆5,284Updated this week
srush / Autodiff-Puzzles
View on GitHub
☆498Oct 18, 2024Updated last year
meta-pytorch / applied-ai
View on GitHub
Applied AI experiments and examples for PyTorch
☆319Aug 22, 2025Updated 6 months ago
mirage-project / mirage
View on GitHub
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
☆2,145Feb 23, 2026Updated last week
NVIDIA / cutlass
View on GitHub
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆9,348Updated this week
dropbox / gemlite
View on GitHub
Fast low-bit matmul kernels in Triton
☆436Feb 1, 2026Updated last month
meta-pytorch / tritonbench
View on GitHub
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆327Updated this week
flagos-ai / FlagGems
View on GitHub
FlagGems is an operator library for large language models implemented in the Triton Language.
☆909Updated this week
meta-pytorch / attention-gym
View on GitHub
Helpful tools and examples for working with flex-attention
☆1,140Feb 8, 2026Updated 3 weeks ago
xlite-dev / LeetCUDA
View on GitHub
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
☆9,755Feb 25, 2026Updated last week
pytorch / ao
View on GitHub
PyTorch native quantization and sparsity for training and inference
☆2,707Updated this week
ScalingIntelligence / KernelBench
View on GitHub
KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)
☆820Feb 27, 2026Updated last week
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆838Updated this week
Dao-AILab / flash-attention
View on GitHub
Fast and memory-efficient exact attention
☆22,460Updated this week
zhuzilin / ring-flash-attention
View on GitHub
Ring attention implementation with flash attention
☆987Sep 10, 2025Updated 5 months ago
srush / annotated-mamba
View on GitHub
Annotated version of the Mamba paper
☆497Feb 27, 2024Updated 2 years ago
rkinas / triton-resources
View on GitHub
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
☆461Mar 10, 2025Updated 11 months ago
BBuf / how-to-optim-algorithm-in-cuda
View on GitHub
how to optimize some algorithm in cuda.
☆2,841Updated this week
huggingface / nanotron
View on GitHub
Minimalistic large language model 3D-parallelism training
☆2,579Feb 19, 2026Updated 2 weeks ago
stas00 / ml-engineering
View on GitHub
Machine Learning Engineering Open Book
☆17,286Feb 21, 2026Updated last week
shawntan / scattermoe
View on GitHub
Triton-based implementation of Sparse Mixture of Experts.
☆268Oct 3, 2025Updated 5 months ago
NVIDIA / TransformerEngine
View on GitHub
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…
☆3,176Updated this week