pytorch-labs / tritonparseLinks

TritonParse: A Compiler Tracer, Visualizer, and mini-Reproducer(WIP) for Triton Kernels

☆138

Alternatives and similar repositories for tritonparse

Users that are interested in tritonparse are comparing it to the libraries listed below

Sorting:

pytorch / helion
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆212Updated this week
pytorch-labs / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆199Updated this week
triton-lang / triton-cpu
An experimental CPU backend for Triton
☆139Updated 2 months ago
cchan / tccl
extensible collectives library in triton
☆88Updated 4 months ago
bertmaher / simplegemm
☆110Updated 4 months ago
Deep-Learning-Profiling-Tools / triton-viz
☆227Updated this week
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆142Updated 4 months ago
microsoft / TileFusion
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆93Updated last month
triton-lang / kernels
☆85Updated 9 months ago
openxla / shardy
MLIR-based partitioning system
☆115Updated this week
Dao-AILab / quack
A Quirky Assortment of CuTe Kernels
☆388Updated this week
mobiusml / gemlite
Fast low-bit matmul kernels in Triton
☆339Updated this week
pytorch-labs / triton-cpu
An experimental CPU backend for Triton (https//github.com/openai/triton)
☆43Updated 4 months ago
RadeonFlow / RadeonFlow_Kernels
Efficient implementation of DeepSeek Ops (Blockwise FP8 GEMM, MoE, and MLA) for AMD Instinct MI300X
☆60Updated this week
gpu-mode / reference-kernels
Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!
☆69Updated 3 weeks ago
gpu-mode / triton-index
Cataloging released Triton kernels.
☆247Updated 6 months ago
NVIDIA / jaxpp
JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training
☆52Updated 3 weeks ago
HazyResearch / Megakernels
kernels, of the mega variety
☆466Updated 2 months ago
pranjalssh / fast.cu
Fastest kernels written from scratch
☆310Updated 4 months ago
pytorch-labs / applied-ai
Applied AI experiments and examples for PyTorch
☆289Updated 2 months ago
salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆98Updated 6 months ago
MekkCyber / CutlassAcademy
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
☆205Updated 3 months ago
microsoft / triton-shared
Shared Middle-Layer for Triton Compilation
☆261Updated this week
IBM / triton-dejavu
Framework to reduce autotune overhead to zero for well known deployments.
☆79Updated last week
ademeure / cuda-side-boost
☆41Updated 3 months ago
NVIDIA / compute-eval
Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…
☆57Updated last month
ScalingIntelligence / KernelBench
KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems
☆505Updated last week
gpu-mode / ring-attention
ring-attention experiments
☆146Updated 9 months ago
ROCm / aotriton
Ahead of Time (AOT) Triton Math Library
☆75Updated this week
0xD0GF00D / DocumentSASS
Unofficial description of the CUDA assembly (SASS) instruction sets.
☆132Updated 2 weeks ago