IaroslavElistratov / triton-autodiffLinks
☆18Updated last week
Alternatives and similar repositories for triton-autodiff
Users that are interested in triton-autodiff are comparing it to the libraries listed below
Sorting:
- Write a fast kernel and run it on Discord. See how you compare against the best!☆61Updated last week
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆138Updated 2 months ago
- How to ensure correctness and ship LLM generated kernels in PyTorch☆121Updated last week
- Automatic differentiation for Triton Kernels☆30Updated 3 months ago
- ☆28Updated 10 months ago
- train with kittens!☆63Updated last year
- Learning about CUDA by writing PTX code.☆147Updated last year
- Quantized LLM training in pure CUDA/C++.☆216Updated this week
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 7 months ago
- High-Performance SGEMM on CUDA devices☆112Updated 10 months ago
- A bunch of kernels that might make stuff slower 😉☆64Updated this week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆150Updated 2 years ago
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆158Updated last week
- ring-attention experiments☆155Updated last year
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆171Updated last week
- Experiment of using Tangent to autodiff triton☆80Updated last year
- ☆12Updated 2 months ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆195Updated 2 years ago
- Make triton easier☆48Updated last year
- SIMD quantization kernels☆92Updated 2 months ago
- Hand-Rolled GPU communications library☆65Updated this week
- ☆69Updated last week
- extensible collectives library in triton☆91Updated 7 months ago
- seqax = sequence modeling + JAX☆168Updated 4 months ago
- Parallel framework for training and fine-tuning deep neural networks☆68Updated 2 weeks ago
- TORCH_LOGS parser for PT2☆64Updated last week
- An interactive web-based tool for exploring intermediate representations of PyTorch and Triton models☆50Updated 2 months ago
- 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆64Updated 10 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆47Updated 3 months ago
- Collection of kernels written in Triton language☆167Updated 7 months ago