alexzhang13 / Triton-Puzzles-SolutionsLinks

Personal solutions to the Triton Puzzles

☆20

Alternatives and similar repositories for Triton-Puzzles-Solutions

Users that are interested in Triton-Puzzles-Solutions are comparing it to the libraries listed below

Sorting:

srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
open-lm-engine / accelerated-model-architectures
A bunch of kernels that might make stuff slower 😉
☆63Updated this week
graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆45Updated last year
Jokeren / triton-samples
☆28Updated 9 months ago
gpu-mode / ring-attention
ring-attention experiments
☆155Updated last year
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆58Updated 2 weeks ago
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆47Updated last year
stanford-futuredata / stk
☆112Updated last year
GindaChen / FlexFlashAttention3
FlexAttention w/ FlashAttention3 Support
☆27Updated last year
HazyResearch / train-tk
train with kittens!
☆63Updated last year
cchan / tccl
extensible collectives library in triton
☆90Updated 7 months ago
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆197Updated 4 months ago
nshepperd / flash_attn_jax
JAX bindings for Flash Attention v2
☆97Updated last week
UmerHA / triton_util
Make triton easier
☆48Updated last year
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆102Updated 2 weeks ago
test-time-training / ttt-tk
☆41Updated 2 weeks ago
gpu-mode / triton-tutorials
☆15Updated 5 months ago
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆159Updated 6 months ago
Dao-AILab / gemm-cublas
☆23Updated 5 months ago
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
Doraemonzzz / Awesome-Triton-Resources
Awesome Triton Resources
☆36Updated 6 months ago
andylolu2 / simpleGEMM
The simplest but fast implementation of matrix multiplication in CUDA.
☆39Updated last year
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆132Updated 3 months ago
Dao-AILab / grouped-latent-attention
☆130Updated 5 months ago
cloneofsimo / min-fsdp
☆91Updated last year
meta-pytorch / kraken
Triton-based Symmetric Memory operators and examples
☆58Updated 2 weeks ago
drisspg / transformer_nuggets
A place to store reusable transformer components of my own creation or found on the interwebs
☆59Updated 2 weeks ago
xiayuqing0622 / flex_head_fa
Fast and memory-efficient exact attention
☆71Updated 8 months ago
axonn-ai / axonn
Parallel framework for training and fine-tuning deep neural networks
☆65Updated last week
NX-AI / flashrnn
FlashRNN - Fast RNN Kernels with I/O Awareness
☆103Updated 2 weeks ago