daniel-geon-park / triton_bwd
Automatic differentiation for Triton Kernels
☆11Updated last month
Alternatives and similar repositories for triton_bwd
Users that are interested in triton_bwd are comparing it to the libraries listed below
Sorting:
- Framework to reduce autotune overhead to zero for well known deployments.☆70Updated this week
- ☆79Updated 6 months ago
- extensible collectives library in triton☆86Updated last month
- DeeperGEMM: crazy optimized version☆69Updated last week
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆85Updated last week
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆124Updated this week
- ☆58Updated 3 weeks ago
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆23Updated last week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆42Updated 2 months ago
- Debug print operator for cudagraph debugging☆10Updated 9 months ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆38Updated 9 months ago
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆138Updated this week
- ☆33Updated this week
- ☆27Updated last year
- A bunch of kernels that might make stuff slower 😉☆40Updated this week
- ☆13Updated 2 months ago
- ☆29Updated last year
- Ahead of Time (AOT) Triton Math Library☆63Updated this week
- ☆15Updated last week
- ☆34Updated this week
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆38Updated 2 years ago
- ☆13Updated 5 months ago
- Artifacts of EVT ASPLOS'24☆24Updated last year
- Thunder Research Group's Collective Communication Library☆36Updated last year
- Effective transpose on Hopper GPU☆18Updated 2 weeks ago
- TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators☆38Updated 2 months ago
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆45Updated 9 months ago
- ☆104Updated 8 months ago
- Microsoft Collective Communication Library☆65Updated 5 months ago
- ☆27Updated 4 months ago