daniel-geon-park / triton_bwdLinks
Automatic differentiation for Triton Kernels
☆11Updated this week
Alternatives and similar repositories for triton_bwd
Users that are interested in triton_bwd are comparing it to the libraries listed below
Sorting:
- DeeperGEMM: crazy optimized version☆69Updated 2 months ago
- ☆84Updated 8 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆79Updated this week
- extensible collectives library in triton☆87Updated 3 months ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆91Updated 3 weeks ago
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆23Updated 2 months ago
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆195Updated this week
- ☆60Updated 2 months ago
- ☆49Updated 2 months ago
- ☆35Updated this week
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆189Updated this week
- ☆50Updated last month
- TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators☆62Updated last month
- QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning☆17Updated last week
- Debug print operator for cudagraph debugging☆12Updated 11 months ago
- A bunch of kernels that might make stuff slower 😉☆55Updated last week
- ☆226Updated this week
- A Top-Down Profiler for GPU Applications☆20Updated last year
- ☆106Updated 10 months ago
- ring-attention experiments☆143Updated 9 months ago
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆40Updated 2 years ago
- Microsoft Collective Communication Library☆63Updated 8 months ago
- Thunder Research Group's Collective Communication Library☆38Updated 2 weeks ago
- TritonParse is a tool designed to help developers analyze and debug Triton kernels by visualizing the compilation process and source code…☆132Updated this week
- Quantized Attention on GPU☆44Updated 8 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆43Updated 4 months ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆40Updated 11 months ago
- ☆27Updated last year
- Hydragen: High-Throughput LLM Inference with Shared Prefixes☆40Updated last year
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆14Updated 7 months ago