IaroslavElistratov / triton-autodiffLinks
☆12Updated 3 weeks ago
Alternatives and similar repositories for triton-autodiff
Users that are interested in triton-autodiff are comparing it to the libraries listed below
Sorting:
- Write a fast kernel and run it on Discord. See how you compare against the best!☆46Updated this week
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆96Updated this week
- High-Performance SGEMM on CUDA devices☆97Updated 5 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆135Updated last year
- Learning about CUDA by writing PTX code.☆133Updated last year
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆188Updated last year
- An interactive web-based tool for exploring intermediate representations of PyTorch and Triton models☆46Updated 2 weeks ago
- TritonParse is a tool designed to help developers analyze and debug Triton kernels by visualizing the compilation process and source code…☆131Updated this week
- PyTorch Single Controller☆325Updated this week
- ☆88Updated last year
- ☆28Updated 6 months ago
- Custom kernels in Triton language for accelerating LLMs☆23Updated last year
- Solve puzzles. Learn CUDA.☆64Updated last year
- Experiment of using Tangent to autodiff triton☆79Updated last year
- A bunch of kernels that might make stuff slower 😉☆55Updated last week
- Learn CUDA with PyTorch☆29Updated this week
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆67Updated this week
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 3 months ago
- 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆64Updated 5 months ago
- extensible collectives library in triton☆87Updated 3 months ago
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆361Updated last week
- ☆225Updated last week
- SIMD quantization kernels☆73Updated this week
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training☆51Updated last week
- ☆322Updated 3 weeks ago
- train with kittens!☆61Updated 8 months ago
- seqax = sequence modeling + JAX☆165Updated last month
- Learn GPU Programming in Mojo🔥 by Solving Puzzles☆87Updated last week
- Collection of kernels written in Triton language☆136Updated 3 months ago
- A parallel framework for training deep neural networks☆62Updated 4 months ago