daniel-geon-park / triton_bwdView external linksLinks
Automatic differentiation for Triton Kernels
☆29Aug 12, 2025Updated 6 months ago
Alternatives and similar repositories for triton_bwd
Users that are interested in triton_bwd are comparing it to the libraries listed below
Sorting:
- ☆39Dec 14, 2025Updated 2 months ago
- ☆18Nov 11, 2025Updated 3 months ago
- ☆14Apr 24, 2024Updated last year
- ☆42Jan 24, 2026Updated 3 weeks ago
- A Triton-only attention backend for vLLM☆23Updated this week
- This is a fork of SGLang for hip-attention integration. Please refer to hip-attention for detail.☆18Dec 23, 2025Updated last month
- a size profiler for cuda binary☆72Jan 15, 2026Updated last month
- DeeperGEMM: crazy optimized version☆74May 5, 2025Updated 9 months ago
- A Top-Down Profiler for GPU Applications☆22Feb 29, 2024Updated last year
- Triton kernels for Flux☆22Jul 7, 2025Updated 7 months ago
- ☆288Updated this week
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆106Jun 28, 2025Updated 7 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆49Aug 18, 2025Updated 5 months ago
- cuJSON: A Highly Parallel JSON Parser for GPUs☆38Dec 12, 2025Updated 2 months ago
- A bunch of kernels that might make stuff slower 😉☆75Updated this week
- ☆65Apr 26, 2025Updated 9 months ago
- Cataloging released Triton kernels.☆294Sep 9, 2025Updated 5 months ago
- Ship correct and fast LLM kernels to PyTorch☆141Jan 14, 2026Updated last month
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆194Updated this week
- ☆26Dec 3, 2025Updated 2 months ago
- ☆29Nov 16, 2019Updated 6 years ago
- Experiment of using Tangent to autodiff triton☆82Jan 22, 2024Updated 2 years ago
- ☆38Jul 19, 2025Updated 6 months ago
- 详细双语注释版word2vec源码,well-annotated word2vec☆10Oct 3, 2021Updated 4 years ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆326Updated this week
- [NeurIPS '25] GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents☆64Updated this week
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆154Jan 21, 2026Updated 3 weeks ago
- An android VoIP application using native SIP API & ConnectionService (CallKit in iOS) API☆10Mar 13, 2020Updated 5 years ago
- Official codebase for "Context Aware Deep Learning for Multi Modal Depression Detection" [ICASSP 2019, Oral]☆11Dec 26, 2024Updated last year
- CUTLASS and CuTe Examples☆128Nov 30, 2025Updated 2 months ago
- ☆104Nov 7, 2024Updated last year
- A lightweight design for computation-communication overlap.☆221Jan 20, 2026Updated 3 weeks ago
- Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.☆94Feb 23, 2023Updated 2 years ago
- Shared Middle-Layer for Triton Compilation☆329Dec 5, 2025Updated 2 months ago
- ☆14Mar 8, 2025Updated 11 months ago
- ☆18Jun 6, 2025Updated 8 months ago
- contents to be displayed at our projects-page☆17Dec 18, 2023Updated 2 years ago
- Large language models to diffusion finetuning code☆23Jun 2, 2025Updated 8 months ago
- Hack for start other istance of wpa_supplicant daemon☆13Nov 16, 2017Updated 8 years ago