pytorch-labs / tritonparseLinks
TritonParse is a tool designed to help developers analyze and debug Triton kernels by visualizing the compilation process and source code mappings.
☆14Updated this week
Alternatives and similar repositories for tritonparse
Users that are interested in tritonparse are comparing it to the libraries listed below
Sorting:
- ☆13Updated 2 months ago
- FlexAttention w/ FlashAttention3 Support☆26Updated 8 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆44Updated this week
- Personal solutions to the Triton Puzzles☆18Updated 10 months ago
- Make triton easier☆47Updated 11 months ago
- PyTorch implementation of the Flash Spectral Transform Unit.☆16Updated 8 months ago
- ☆12Updated 11 months ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆60Updated last month
- ☆21Updated 3 months ago
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training☆45Updated 2 weeks ago
- Samples of good AI generated CUDA kernels☆65Updated last week
- A bunch of kernels that might make stuff slower 😉☆46Updated this week
- ☆28Updated 4 months ago
- Example of applying CUDA graphs to LLaMA-v2☆12Updated last year
- TORCH_LOGS parser for PT2☆38Updated last week
- ☆16Updated 8 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆44Updated 10 months ago
- High-Performance SGEMM on CUDA devices☆94Updated 4 months ago
- Automatic differentiation for Triton Kernels☆11Updated 2 months ago
- Experiment of using Tangent to autodiff triton☆79Updated last year
- extensible collectives library in triton☆87Updated 2 months ago
- PyTorch centric eager mode debugger☆47Updated 5 months ago
- Experimental GPU language with meta-programming☆22Updated 9 months ago
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆15Updated 3 weeks ago
- Effective transpose on Hopper GPU☆20Updated last month
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆43Updated 2 months ago
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆17Updated 2 months ago
- ☆13Updated 3 weeks ago
- ☆49Updated 2 weeks ago
- [WIP] Better (FP8) attention for Hopper☆30Updated 3 months ago