octoml / relaxLinks

A fork of tvm/unity

☆14

Alternatives and similar repositories for relax

Users that are interested in relax are comparing it to the libraries listed below

Sorting:

masahi / torchscript-to-tvm
☆69Updated 2 years ago
buaa-hipo / dlcompiler-comparison
The quantitative performance comparison among DL compilers on CNN models.
☆74Updated 4 years ago
tlc-pack / TLCBench
Benchmark scripts for TVM
☆74Updated 3 years ago
tlc-pack / cutlass_fpA_intB_gemm
A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
☆92Updated last week
microsoft / TileFusion
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆88Updated this week
tlc-pack / tlcpack
☆24Updated last year
apache / tvm-rfcs
A home for the final text of all TVM RFCs.
☆106Updated 8 months ago
pytorch-labs / triton-cpu
An experimental CPU backend for Triton (https//github.com/openai/triton)
☆43Updated 2 months ago
masahi / tvm-cutlass-eval
☆40Updated 3 years ago
ankan-ban / llama_cu_awq
llama INT4 cuda inference with AWQ
☆54Updated 4 months ago
tlc-pack / libflash_attn
Standalone Flash Attention v2 kernel without libtorch dependency
☆110Updated 8 months ago
LeiWang1999 / tvm_gpu_gemm
play gemm with tvm
☆91Updated last year
microsoft / FractalTensor
FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …
☆25Updated 5 months ago
limenghao / AdaTune
This is the implementation for paper: AdaTune: Adaptive Tensor Program CompilationMade Efficient (NeurIPS 2020).
☆13Updated 4 years ago
UofT-EcoSystem / DietCode
DietCode Code Release
☆64Updated 2 years ago
wzh99 / relay-mlir
An MLIR-based toy DL compiler for TVM Relay.
☆58Updated 2 years ago
cmu-catalyst / collage
System for automated integration of deep learning backends.
☆48Updated 2 years ago
nox-410 / tvm.tl
An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.
☆50Updated 10 months ago
anony-sub / chameleon
Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation
☆27Updated 5 years ago
IntelLabs / FP8-Emulation-Toolkit
PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.
☆110Updated 6 months ago
tlc-pack / tenset
☆92Updated 2 years ago
polymage-labs / mlirx
MLIRX is now defunct. Please see PolyBlocks - https://docs.polymagelabs.com
☆38Updated last year
tlc-pack / tophub
tophub autotvm log collections
☆69Updated 2 years ago
LeiWang1999 / Stream-k.tvm
☆19Updated 8 months ago
LeiWang1999 / TVM.CMakeExtend
Tutorials of Extending and importing TVM with CMAKE Include dependency.
☆13Updated 7 months ago
awslabs / lorien
☆43Updated last year
uwsampl / SparseTIR
SparseTIR: Sparse Tensor Compiler for Deep Learning
☆138Updated 2 years ago
triton-lang / triton-cpu
An experimental CPU backend for Triton
☆119Updated this week
Bruce-Lee-LY / cuda_hgemv
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
☆62Updated 9 months ago
wangsiping97 / FastGEMV
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
☆109Updated 10 months ago