IaroslavElistratov / triton-autodiffLinks

☆18

Alternatives and similar repositories for triton-autodiff

Users that are interested in triton-autodiff are comparing it to the libraries listed below

Sorting:

gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆61Updated last week
PrimeIntellect-ai / pccl
PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP
☆138Updated 2 months ago
meta-pytorch / BackendBench
How to ensure correctness and ship LLM generated kernels in PyTorch
☆121Updated last week
daniel-geon-park / triton_bwd
Automatic differentiation for Triton Kernels
☆30Updated 3 months ago
Jokeren / triton-samples
☆28Updated 10 months ago
HazyResearch / train-tk
train with kittens!
☆63Updated last year
unixpickle / learn-ptx
Learning about CUDA by writing PTX code.
☆147Updated last year
IST-DASLab / llmq
Quantized LLM training in pure CUDA/C++.
☆216Updated this week
cloneofsimo / ptx-tutorial-by-aislop
PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)
☆66Updated 7 months ago
salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆112Updated 10 months ago
open-lm-engine / accelerated-model-architectures
A bunch of kernels that might make stuff slower 😉
☆64Updated this week
siboehm / ShallowSpeed
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
☆150Updated 2 years ago
gpu-mode / reference-kernels
Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!
☆158Updated last week
gpu-mode / ring-attention
ring-attention experiments
☆155Updated last year
meta-pytorch / tritonparse
TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels
☆171Updated last week
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
malfet / llm_experiments
☆12Updated 2 months ago
linjames0 / Transformer-CUDA
An implementation of the transformer architecture onto an Nvidia CUDA kernel
☆195Updated 2 years ago
UmerHA / triton_util
Make triton easier
☆48Updated last year
PrimeIntellect-ai / pi-quant
SIMD quantization kernels
☆92Updated 2 months ago
SzymonOzog / Penny
Hand-Rolled GPU communications library
☆65Updated this week
gpu-mode / popcorn-cli
☆69Updated last week
cchan / tccl
extensible collectives library in triton
☆91Updated 7 months ago
MatX-inc / seqax
seqax = sequence modeling + JAX
☆168Updated 4 months ago
axonn-ai / axonn
Parallel framework for training and fine-tuning deep neural networks
☆68Updated 2 weeks ago
meta-pytorch / tlparse
TORCH_LOGS parser for PT2
☆64Updated last week
MrSidims / PytorchExplorer
An interactive web-based tool for exploring intermediate representations of PyTorch and Triton models
☆50Updated 2 months ago
CentML / DeepView.Profile
🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.
☆64Updated 10 months ago
meta-pytorch / triton-cpu
An experimental CPU backend for Triton (https//github.com/openai/triton)
☆47Updated 3 months ago
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆167Updated 7 months ago