IaroslavElistratov / triton-autodiffLinks
☆18Updated 2 months ago
Alternatives and similar repositories for triton-autodiff
Users that are interested in triton-autodiff are comparing it to the libraries listed below
Sorting:
- Write a fast kernel and run it on Discord. See how you compare against the best!☆66Updated last week
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆141Updated 4 months ago
- Ship correct and fast LLM kernels to PyTorch☆132Updated this week
- High-Performance SGEMM on CUDA devices☆115Updated 11 months ago
- ☆28Updated last year
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆202Updated 2 years ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆185Updated this week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆154Updated 2 years ago
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆188Updated 3 weeks ago
- SIMD quantization kernels☆93Updated 4 months ago
- Nsight Python is a Python kernel profiling interface based on NVIDIA Nsight Tools☆89Updated last week
- Quantized LLM training in pure CUDA/C++.☆231Updated this week
- ☆83Updated last month
- Hand-Rolled GPU communications library☆76Updated last month
- ☆12Updated 4 months ago
- extensible collectives library in triton☆92Updated 9 months ago
- ☆178Updated last year
- 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆64Updated 11 months ago
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆91Updated last week
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆711Updated this week
- Learning about CUDA by writing PTX code.☆151Updated last year
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training☆62Updated last week
- Collection of kernels written in Triton language☆174Updated 9 months ago
- A bunch of kernels that might make stuff slower 😉☆73Updated last week
- An interactive web-based tool for exploring intermediate representations of PyTorch and Triton models☆50Updated 3 weeks ago
- ring-attention experiments☆161Updated last year
- TORCH_LOGS parser for PT2☆70Updated 2 weeks ago
- ctypes wrappers for HIP, CUDA, and OpenCL☆130Updated last year
- ☆271Updated last week
- ☆91Updated last year