OpenMLIR / mlir-tutorialLinks

Hands-On Practical MLIR Tutorial

☆51

Alternatives and similar repositories for mlir-tutorial

Users that are interested in mlir-tutorial are comparing it to the libraries listed below

Sorting:

Cambricon / triton-linalg
Development repository for the Triton-Linalg conversion
☆216Updated last year
KnowingNothing / MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
☆405Updated 3 weeks ago
JackonYang / hands-on-tvm
hands on model tuning with TVM and profile it on a Mac M1, x86 CPU, and GTX-1080 GPU.
☆49Updated 2 years ago
InfiniTensor / InfiniTensor
☆285Updated last week
Archermmt / tvm_walk_through
code reading for tvm
☆76Updated 4 years ago
nicolaswilde / cuda-tensorcore-hgemm
☆157Updated last year
DD-DuDa / Cute-Learning
Examples of CUDA implementations by Cutlass CuTe
☆270Updated 7 months ago
MARD1NO / CUDA-PPT
☆118Updated 10 months ago
Yinghan-Li / YHs_Sample
Yinghan's Code Sample
☆364Updated 3 years ago
buddy-compiler / buddy-mlir
An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).
☆690Updated last week
reed-lau / cute-gemm
☆161Updated 2 months ago
yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
☆403Updated last year
flagos-ai / FlagTree
FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang…
☆200Updated this week
AdvancedCompiler / AdvancedCompiler
先进编译实验室的个人主页
☆195Updated 3 months ago
Cjkkkk / CUDA_gemm
A simple high performance CUDA GEMM implementation.
☆426Updated 2 years ago
galois-stack / galois
a tensor computing compiler based tile programming for gpu, cpu or tpu
☆45Updated this week
KEKE046 / mlir-tutorial
Hands-On Practical MLIR Tutorial
☆710Updated 2 years ago
ArthurinRUC / cutlass-notes
From Minimal GEMM to Everything
☆101Updated last month
nicolaswilde / cuda-sgemm
☆70Updated last year
DefTruth / CUDA-Learn-Notes
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
☆63Updated 9 months ago
ColfaxResearch / cfx-article-src
☆173Updated 8 months ago
microsoft / triton-shared
Shared Middle-Layer for Triton Compilation
☆325Updated 2 months ago
FdyCN / PTX-ISA
CUDA PTX-ISA Document 中文翻译版
☆49Updated 4 months ago
violetDelia / MLIR-Tutorial
☆79Updated 3 months ago
Bruce-Lee-LY / cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…
☆520Updated last year
buddy-compiler / buddy-benchmark
Benchmark Framework for Buddy Projects
☆55Updated 3 months ago
Qwesh157 / conv_op_optimization
This project is about convolution operator optimization on GPU, include GEMM based (Implicit GEMM) convolution.
☆43Updated 4 months ago
StrongSpoon / tvm.schedule
examples for tvm schedule API
☆101Updated 2 years ago
njuhope / cuda_sgemm
☆120Updated last year
guanrenyang / Programming-Massively-Parallel-Processors
Solution of Programming Massively Parallel Processors
☆49Updated 2 years ago