Puzzles for learning Triton
☆2,336Mar 18, 2026Updated this week
Alternatives and similar repositories for Triton-Puzzles
Users that are interested in Triton-Puzzles are comparing it to the libraries listed below
Sorting:
- ☆307Updated this week
- Puzzles for learning Triton, play it with minimal environment configuration!☆640Dec 28, 2025Updated 2 months ago
- What would you do with 1000 H100s...☆1,157Jan 10, 2024Updated 2 years ago
- Tile primitives for speedy kernels☆3,232Updated this week
- Development repository for the Triton language and compiler☆18,656Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆597Aug 12, 2025Updated 7 months ago
- Puzzles for exploring transformers☆387May 4, 2023Updated 2 years ago
- Cataloging released Triton kernels.☆298Sep 9, 2025Updated 6 months ago
- FlashInfer: Kernel Library for LLM Serving☆5,145Updated this week
- Efficient Triton Kernels for LLM Training☆6,204Mar 12, 2026Updated last week
- Material for gpu-mode lectures☆5,841Feb 1, 2026Updated last month
- Solve puzzles. Improve your pytorch.☆3,976Jul 15, 2024Updated last year
- 🚀 Efficient implementations of state-of-the-art linear attention models☆4,630Updated this week
- Solve puzzles. Learn CUDA.☆11,997Sep 1, 2024Updated last year
- GPU programming related news and material links☆2,047Mar 8, 2026Updated last week
- Distributed Compiler based on Triton for Parallel Systems☆1,386Mar 11, 2026Updated last week
- ☆498Oct 18, 2024Updated last year
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆2,156Mar 12, 2026Updated last week
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆5,364Updated this week
- A PyTorch native platform for training generative AI models☆5,139Updated this week
- Applied AI experiments and examples for PyTorch☆319Aug 22, 2025Updated 6 months ago
- 📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉☆9,872Mar 12, 2026Updated last week
- Fast low-bit matmul kernels in Triton☆438Feb 1, 2026Updated last month
- CUDA Templates and Python DSLs for High-Performance Linear Algebra☆9,442Updated this week
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆330Updated this week
- FlagGems is an operator library for large language models implemented in the Triton Language.☆917Updated this week
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆466Mar 10, 2025Updated last year
- Fast and memory-efficient exact attention☆22,832Updated this week
- Helpful tools and examples for working with flex-attention☆1,157Feb 8, 2026Updated last month
- PyTorch native quantization and sparsity for training and inference☆2,730Updated this week
- Ring attention implementation with flash attention☆996Sep 10, 2025Updated 6 months ago
- A Quirky Assortment of CuTe Kernels☆861Updated this week
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)☆852Mar 9, 2026Updated last week
- how to optimize some algorithm in cuda.☆2,863Mar 11, 2026Updated last week
- Annotated version of the Mamba paper☆499Feb 27, 2024Updated 2 years ago
- Collection of kernels written in Triton language☆184Jan 27, 2026Updated last month
- Machine Learning Engineering Open Book☆17,361Mar 11, 2026Updated last week
- Minimalistic 4D-parallelism distributed training framework for education purpose☆2,116Aug 26, 2025Updated 6 months ago
- Minimalistic large language model 3D-parallelism training☆2,609Feb 19, 2026Updated last month