octoml / relax
A fork of tvm/unity
☆15Updated last year
Related projects: ⓘ
- The quantitative performance comparison among DL compilers on CNN models.☆72Updated 4 years ago
- Benchmark scripts for TVM☆73Updated 2 years ago
- ☆66Updated last year
- ☆23Updated 7 months ago
- A home for the final text of all TVM RFCs.☆99Updated 3 months ago
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆82Updated 6 months ago
- Benchmark code for the "Online normalizer calculation for softmax" paper☆52Updated 6 years ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆93Updated last week
- tophub autotvm log collections☆70Updated last year
- ☆9Updated last year
- DietCode Code Release☆59Updated 2 years ago
- System for automated integration of deep learning backends.☆48Updated 2 years ago
- A Winograd Minimal Filter Implementation in CUDA☆20Updated 3 years ago
- Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation☆26Updated 4 years ago
- ☆34Updated 2 years ago
- This is the implementation for paper: AdaTune: Adaptive Tensor Program CompilationMade Efficient (NeurIPS 2020).☆13Updated 3 years ago
- play gemm with tvm☆81Updated last year
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆30Updated 4 months ago
- llama INT4 cuda inference with AWQ☆46Updated 2 months ago
- An IR for efficiently simulating distributed ML computation.☆24Updated 8 months ago
- An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.☆49Updated last month
- A translator from c to MLIR☆27Updated 2 years ago
- Issues related to MLPerf™ Inference policies, including rules and suggested changes☆55Updated 2 months ago
- TVM for Tenstorrent ASICs☆18Updated this week
- MLIRX is now defunct. Please see PolyBlocks - https://docs.polymagelabs.com☆38Updated 9 months ago
- Home for OctoML PyTorch Profiler☆105Updated last year
- Play with MLIR right in your browser☆122Updated last year
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.☆98Updated 9 months ago
- An optimizing compiler for decision tree ensemble inference.☆15Updated last week