lessw2020 / triton_kernels_for_fun_and_profit
Custom kernels in Triton language for accelerating LLMs
☆18Updated 11 months ago
Alternatives and similar repositories for triton_kernels_for_fun_and_profit:
Users that are interested in triton_kernels_for_fun_and_profit are comparing it to the libraries listed below
- Write a fast kernel and run it on Discord. See how you compare against the best!☆34Updated this week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆127Updated last year
- ☆191Updated this week
- Learn CUDA with PyTorch☆19Updated last month
- Cataloging released Triton kernels.☆204Updated 2 months ago
- Experiment of using Tangent to autodiff triton☆78Updated last year
- ring-attention experiments☆127Updated 5 months ago
- Custom triton kernels for training Karpathy's nanoGPT.☆18Updated 5 months ago
- Collection of kernels written in Triton language☆114Updated last month
- ☆151Updated last year
- extensible collectives library in triton☆84Updated 6 months ago
- Applied AI experiments and examples for PyTorch☆249Updated this week
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 8 months ago
- Fast low-bit matmul kernels in Triton☆267Updated this week
- ML/DL Math and Method notes☆58Updated last year
- Google TPU optimizations for transformers models☆103Updated 2 months ago
- Make triton easier☆47Updated 9 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆189Updated this week
- Normalized Transformer (nGPT)☆162Updated 4 months ago
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆54Updated last month
- ☆158Updated last month
- ☆43Updated last year
- This repository contains the experimental PyTorch native float8 training UX☆222Updated 7 months ago
- ☆136Updated 2 months ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆234Updated this week
- Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.☆51Updated 11 months ago