daniel-geon-park / triton_bwd
Automatic differentiation for Triton Kernels
☆11Updated last month
Alternatives and similar repositories for triton_bwd:
Users that are interested in triton_bwd are comparing it to the libraries listed below
- Framework to reduce autotune overhead to zero for well known deployments.☆65Updated last week
- DeeperGEMM: crazy optimized version☆67Updated 3 weeks ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆82Updated last week
- ☆78Updated 5 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆116Updated this week
- ☆55Updated 2 weeks ago
- extensible collectives library in triton☆85Updated 3 weeks ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆40Updated last month
- ☆31Updated this week
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆23Updated 4 months ago
- ☆13Updated last month
- A bunch of kernels that might make stuff slower 😉☆34Updated this week
- Transformers components but in Triton☆32Updated last month
- ☆29Updated this week
- ☆19Updated 6 months ago
- TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators☆35Updated last month
- A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.☆33Updated last month
- ☆26Updated last year
- SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs☆42Updated last month
- Quantized Attention on GPU☆45Updated 5 months ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆37Updated 9 months ago
- ☆26Updated last year
- Boosting 4-bit inference kernels with 2:4 Sparsity☆72Updated 7 months ago
- ☆68Updated 3 months ago
- ThrillerFlow is a Dataflow Analysis and Codegen Framework written in Rust.☆14Updated 5 months ago
- Ahead of Time (AOT) Triton Math Library☆58Updated this week
- ☆13Updated 4 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆84Updated 5 months ago
- ☆68Updated 4 months ago
- ☆9Updated last year