pytorch-labs / triton-cpuLinks

An experimental CPU backend for Triton (https//github.com/openai/triton)

☆43

Alternatives and similar repositories for triton-cpu

Users that are interested in triton-cpu are comparing it to the libraries listed below

Sorting:

openxla / shardy
MLIR-based partitioning system
☆114Updated this week
microsoft / TileFusion
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆91Updated last month
triton-lang / triton-cpu
An experimental CPU backend for Triton
☆138Updated 2 months ago
nod-ai / SHARK-ModelDev
Unified compiler/runtime for interfacing with PyTorch Dynamo.
☆101Updated 3 weeks ago
triton-lang / kernels
☆85Updated 8 months ago
ROCm / aotriton
Ahead of Time (AOT) Triton Math Library
☆74Updated last week
ROCm / TransformerEngine
☆40Updated this week
makslevental / nelli
A lightweight, Pythonic, frontend for MLIR
☆80Updated last year
wangsiping97 / FastGEMV
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
☆113Updated last year
cchan / tccl
extensible collectives library in triton
☆88Updated 4 months ago
pytorch-labs / helion
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆200Updated this week
wmmae / wmma_extension
An extension library of WMMA API (Tensor Core API)
☆99Updated last year
north-numerical-computing / tensor-cores-numerical-behavior
Test suite for probing the numerical behavior of NVIDIA tensor cores
☆40Updated last year
intel / intel-xpu-backend-for-triton
OpenAI Triton backend for Intel® GPUs
☆196Updated this week
IBM / triton-dejavu
Framework to reduce autotune overhead to zero for well known deployments.
☆79Updated last week
pytorch-labs / tritonparse
TritonParse: A Compiler Tracer, Visualizer, and mini-Reproducer(WIP) for Triton Kernels
☆138Updated this week
pytorch / tlparse
TORCH_LOGS parser for PT2
☆47Updated last week
ROCm / triton
Development repository for the Triton language and compiler
☆126Updated this week
iree-org / iree-nvgpu
☆50Updated last year
iree-org / iree-turbine
IREE's PyTorch Frontend, based on Torch Dynamo.
☆94Updated last week
ROCm / rocMLIR
☆148Updated this week
uwsampl / SparseTIR
SparseTIR: Sparse Tensor Compiler for Deep Learning
☆137Updated 2 years ago
salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆98Updated 6 months ago
roastduck / FreeTensor
A language and compiler for irregular tensor programs.
☆149Updated 8 months ago
microsoft / triton-shared
Shared Middle-Layer for Triton Compilation
☆260Updated this week
makslevental / mlir-python-extras
The missing pieces (as far as boilerplate reduction goes) of the upstream MLIR python bindings.
☆104Updated last week
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆139Updated 3 months ago
ademeure / cuda-side-boost
☆20Updated 2 months ago
IntelLabs / FP8-Emulation-Toolkit
PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.
☆110Updated 7 months ago
sunlex0717 / DissectingTensorCores
☆106Updated last year