OpenMLIR / triton_runnerLinks

Triton multi-level runner, include cubin, ptx, ttgir etc.

☆16

Alternatives and similar repositories for triton_runner

Users that are interested in triton_runner are comparing it to the libraries listed below

Sorting:

AlibabaResearch / mononn
☆28Updated last year
open-neutrino / neutrino
☆122Updated 3 weeks ago
SJTU-ReArch-Group / Paper-Reading-List
☆114Updated 3 weeks ago
humuyan / Korch
ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch
☆37Updated 3 months ago
nox-410 / Welder
OSDI 2023 Welder, deeplearning compiler
☆21Updated last year
pku-liang / MAGIS
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)
☆52Updated last year
zhen8838 / handson-polyhedral
tutorials about polyhedral compilation.
☆49Updated 5 months ago
pku-liang / AMOS
Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators
☆114Updated 2 years ago
microsoft / ConvStencil
☆31Updated last year
MoZeWei / moTuner
☆10Updated 3 years ago
summerspringwei / souffle-ae
☆18Updated last year
Yongqi-Zhuo / triton-tvm
Triton to TVM transpiler.
☆21Updated 9 months ago
buddy-compiler / buddy-benchmark
Benchmark Framework for Buddy Projects
☆55Updated last week
sjfeng1999 / gpu-arch-microbenchmark
Dissecting NVIDIA GPU Architecture
☆102Updated 3 years ago
sunlex0717 / DissectingTensorCores
☆105Updated last year
ParCIS / Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
☆89Updated 2 years ago
PAA-NCIC / PE
performance engineering
☆30Updated last year
GVProf / GVProf
GVProf: A Value Profiler for GPU-based Clusters
☆51Updated last year
pku-liang / TileFlow
TileFlow is a performance analysis tool based on Timeloop for fusion dataflows
☆62Updated last year
HPMLL / NVIDIA-Hopper-Benchmark
☆50Updated last month
zhaiyi000 / tlm
☆42Updated last year
HPMLL / DTC-SpMM_ASPLOS24
☆33Updated last year
sitar-lab / NeuSight
☆45Updated last month
galois-stack / galois
a tensor computing compiler based tile programming for gpu, cpu or tpu
☆44Updated this week
TiledTensor / TiledCUDA
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆183Updated 5 months ago
c3sr / tcu_scope
☆51Updated 6 years ago
tsinghua-ideal / Canvas
Canvas: End-to-End Kernel Architecture Search in Neural Networks
☆27Updated 8 months ago
NMSU-PEARL / PPT-GPU
Performance Prediction Toolkit for GPUs
☆37Updated 3 years ago
aoli-al / HFuse
Horizontal Fusion
☆25Updated 3 years ago
codyjrivera / tsm2x-imp
Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA
☆33Updated 4 years ago