meta-pytorch/tritonbench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/meta-pytorch/tritonbench)

meta-pytorch / tritonbench

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

☆327

Alternatives and similar repositories for tritonbench

Users that are interested in tritonbench are comparing it to the libraries listed below

Sorting:

microsoft / TileFusion
View on GitHub
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆106Jun 28, 2025Updated 8 months ago
triton-lang / kernels
View on GitHub
☆105Nov 7, 2024Updated last year
IBM / triton-dejavu
View on GitHub
Framework to reduce autotune overhead to zero for well known deployments.
☆97Sep 19, 2025Updated 5 months ago
dropbox / gemlite
View on GitHub
Fast low-bit matmul kernels in Triton
☆436Feb 1, 2026Updated last month
ademeure / DeeperGEMM
View on GitHub
DeeperGEMM: crazy optimized version
☆74May 5, 2025Updated 10 months ago
IntelLabs / EquiTriton
View on GitHub
EquiTriton is a project that seeks to implement high-performance kernels for commonly used building blocks in equivariant neural networks…
☆68Dec 16, 2025Updated 2 months ago
meta-pytorch / applied-ai
View on GitHub
Applied AI experiments and examples for PyTorch
☆319Aug 22, 2025Updated 6 months ago
ColfaxResearch / cutlass-kernels
View on GitHub
☆262Jul 11, 2024Updated last year
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,371Feb 13, 2026Updated 3 weeks ago
flashinfer-ai / cutlass-viz
View on GitHub
☆65Apr 26, 2025Updated 10 months ago
flagos-ai / FlagGems
View on GitHub
FlagGems is an operator library for large language models implemented in the Triton Language.
☆909Updated this week
TiledTensor / TiledCUDA
View on GitHub
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆194Jan 28, 2025Updated last year
Deep-Learning-Profiling-Tools / triton-viz
View on GitHub
☆301Updated this week
gpu-mode / triton-index
View on GitHub
Cataloging released Triton kernels.
☆295Sep 9, 2025Updated 5 months ago
pytorch / helion
View on GitHub
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆774Updated this week
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆18Nov 19, 2024Updated last year
thuml / depyf
View on GitHub
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
☆790Oct 13, 2025Updated 4 months ago
meta-pytorch / tritonparse
View on GitHub
TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels
☆196Updated this week
tile-ai / AttentionEngine
View on GitHub
☆52May 19, 2025Updated 9 months ago
GindaChen / FlexFlashAttention3
View on GitHub
FlexAttention w/ FlashAttention3 Support
☆27Oct 5, 2024Updated last year
ScalingIntelligence / KernelBench
View on GitHub
KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)
☆820Feb 27, 2026Updated last week
cchan / tccl
View on GitHub
extensible collectives library in triton
☆96Mar 31, 2025Updated 11 months ago
HazyResearch / ThunderKittens
View on GitHub
Tile primitives for speedy kernels
☆3,202Feb 24, 2026Updated last week
BobMcDear / attorch
View on GitHub
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆595Aug 12, 2025Updated 6 months ago
nil0x9 / flash-muon
View on GitHub
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆239Jun 15, 2025Updated 8 months ago
KuangjuX / AttnLink
View on GitHub
An experimental communicating attention kernel based on DeepEP.
☆35Jul 29, 2025Updated 7 months ago
flashinfer-ai / debug-print
View on GitHub
Debug print operator for cudagraph debugging
☆14Aug 2, 2024Updated last year
UmerHA / triton_util
View on GitHub
Make triton easier
☆50Jun 12, 2024Updated last year
facebookexperimental / triton
View on GitHub
Github mirror of trition-lang/triton repo.
☆146Updated this week
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆838Updated this week
ColfaxResearch / cfx-article-src
View on GitHub
☆178May 7, 2025Updated 9 months ago
microsoft / BitBLAS
View on GitHub
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
☆753Aug 6, 2025Updated 7 months ago
Dao-AILab / gemm-cublas
View on GitHub
☆22May 5, 2025Updated 10 months ago
tilde-research / nsa-impl
View on GitHub
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆129Jun 24, 2025Updated 8 months ago
flagos-ai / FlagAttention
View on GitHub
A collection of memory efficient attention operators implemented in the Triton language.
☆288Jun 5, 2024Updated last year
Doraemonzzz / Awesome-Triton-Resources
View on GitHub
Awesome Triton Resources
☆39Apr 27, 2025Updated 10 months ago
shawntan / scattermoe
View on GitHub
Triton-based implementation of Sparse Mixture of Experts.
☆268Oct 3, 2025Updated 5 months ago
open-lm-engine / accelerated-model-architectures
View on GitHub
A bunch of kernels that might make stuff slower 😉
☆75Feb 18, 2026Updated 2 weeks ago
srush / Triton-Puzzles
View on GitHub
Puzzles for learning Triton
☆2,324Nov 18, 2024Updated last year