lessw2020 / triton_kernels_for_fun_and_profitLinks
Custom kernels in Triton language for accelerating LLMs
☆20Updated last year
Alternatives and similar repositories for triton_kernels_for_fun_and_profit
Users that are interested in triton_kernels_for_fun_and_profit are comparing it to the libraries listed below
Sorting:
- Write a fast kernel and run it on Discord. See how you compare against the best!☆44Updated this week
- Collection of kernels written in Triton language☆125Updated 2 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆67Updated 2 months ago
- ☆215Updated this week
- A bunch of kernels that might make stuff slower 😉☆46Updated this week
- Experiment of using Tangent to autodiff triton☆79Updated last year
- ring-attention experiments☆143Updated 7 months ago
- Cataloging released Triton kernels.☆226Updated 4 months ago
- extensible collectives library in triton☆87Updated 2 months ago
- Fast low-bit matmul kernels in Triton☆311Updated this week
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆181Updated 3 weeks ago
- This repository contains the experimental PyTorch native float8 training UX☆223Updated 10 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆133Updated last year
- Learn CUDA with PyTorch☆21Updated this week
- LLM training in simple, raw C/CUDA☆99Updated last year
- High-Performance SGEMM on CUDA devices☆94Updated 4 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆184Updated last week
- making the official triton tutorials actually comprehensible☆34Updated 2 months ago
- ☆157Updated last year
- Explore training for quantized models☆18Updated this week
- A place to store reusable transformer components of my own creation or found on the interwebs☆56Updated 3 weeks ago
- Make triton easier☆47Updated 11 months ago
- ☆28Updated 4 months ago
- Custom triton kernels for training Karpathy's nanoGPT.☆19Updated 7 months ago
- Tutorials for Triton, a language for writing gpu kernels☆18Updated last year
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆127Updated this week
- Applied AI experiments and examples for PyTorch☆271Updated last week
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆44Updated 10 months ago
- ☆116Updated 2 weeks ago
- ☆80Updated 6 months ago