comaniac / epoiLinks

Benchmark PyTorch Custom Operators

☆14

Alternatives and similar repositories for epoi

Users that are interested in epoi are comparing it to the libraries listed below

Sorting:

nox-410 / tvm.tl
An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.
☆50Updated last year
uwsampl / SparseTIR
SparseTIR: Sparse Tensor Compiler for Deep Learning
☆137Updated 2 years ago
UofT-EcoSystem / DietCode
DietCode Code Release
☆64Updated 3 years ago
tlc-pack / TLCBench
Benchmark scripts for TVM
☆75Updated 3 years ago
cmu-catalyst / collage
System for automated integration of deep learning backends.
☆47Updated 2 years ago
apuaaChen / EVT_AE
Artifacts of EVT ASPLOS'24
☆26Updated last year
awslabs / ratex
☆23Updated 8 months ago
thu-pacman / PET
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections
☆122Updated 3 years ago
ankan-ban / llama_cu_awq
llama INT4 cuda inference with AWQ
☆54Updated 6 months ago
awslabs / lorien
☆43Updated last year
zhuzilin / pytorch-malloc
An external memory allocator example for PyTorch.
☆14Updated 3 years ago
ceruleangu / Block-Sparse-Benchmark
Benchmark for matrix multiplications between dense and block sparse (BSR) matrix in TVM, blocksparse (Gray et al.) and cuSparse.
☆24Updated 4 years ago
awslabs / raf
☆144Updated 6 months ago
anony-sub / chameleon
Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation
☆27Updated 5 years ago
microsoft / FractalTensor
FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …
☆28Updated 7 months ago
limenghao / AdaTune
This is the implementation for paper: AdaTune: Adaptive Tensor Program CompilationMade Efficient (NeurIPS 2020).
☆14Updated 4 years ago
humuyan / Korch
ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch
☆38Updated 4 months ago
mit-han-lab / inter-operator-scheduler
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
☆200Updated 3 years ago
tlc-pack / tenset
☆93Updated 2 years ago
lixiuhong / batched_gemm
☆39Updated 5 years ago
masahi / torchscript-to-tvm
☆69Updated 2 years ago
LeiWang1999 / Stream-k.tvm
☆19Updated 10 months ago
pku-liang / AMOS
Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators
☆114Updated 2 years ago
BoyuanFeng / APNN-TC
☆19Updated 3 years ago
uwsampl / sparsetir-artifact
Repository for artifact evaluation of ASPLOS 2023 paper "SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning"
☆26Updated 2 years ago
awslabs / slapo
A schedule language for large model training
☆149Updated last year
masahi / tvm-cutlass-eval
☆40Updated 3 years ago
tlc-pack / cutlass_fpA_intB_gemm
A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
☆94Updated 3 weeks ago
uiuc-arc / felix
Optimize tensor program fast with Felix, a gradient descent autotuner.
☆28Updated last year
sjtu-epcc / DVABatch
☆20Updated 3 years ago