MooreThreads / mutlassLinks

MUSA Templates for Linear Algebra Subroutines

☆27

Alternatives and similar repositories for mutlass

Users that are interested in mutlass are comparing it to the libraries listed below

Sorting:

QianyanTech / NBAssembler
Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.
☆80Updated 2 years ago
FdyCN / PTX-ISA
CUDA PTX-ISA Document 中文翻译版
☆42Updated last month
buddy-compiler / buddy-benchmark
Benchmark Framework for Buddy Projects
☆54Updated 3 weeks ago
FlagTree / flagtree
FlagTree is a unified compiler for multiple AI chips, which is forked from triton-lang/triton.
☆53Updated this week
THU-DSP-LAB / llvm-project
LLVM OpenCL C compiler suite for ventus GPGPU
☆48Updated last week
galois-stack / galois
a tensor computing compiler based tile programming for gpu, cpu or tpu
☆43Updated this week
nicolaswilde / cuda-tensorcore-hgemm
☆146Updated 6 months ago
yzhaiustc / Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.
☆148Updated 3 years ago
TiledTensor / TiledCUDA
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆183Updated 5 months ago
sjfeng1999 / gpu-arch-microbenchmark
Dissecting NVIDIA GPU Architecture
☆97Updated 2 years ago
OpenGPGPU / opengpgpu
☆69Updated 8 months ago
InfiniTensor / InfiniTensor
☆235Updated last week
OpenPPL / CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆83Updated 2 years ago
fsword73 / HIP-Performance-Optmization-on-VEGA64
14 basic topics for VEGA64 performance optmization
☆56Updated 4 years ago
pigirons / sgemm_hsw
This is an implementation of sgemm_kernel on L1d cache.
☆228Updated last year
AdvancedCompiler / AdvancedCompiler
先进编译实验室的个人主页
☆103Updated 2 months ago
njuhope / cuda_sgemm
☆113Updated last year
xiaoweiChen / Heterogeneous-Computing-with-OpenCL-2.0
作为对《Heterogeneous Computing with OpenCL 2.0》英文版的中文翻译。
☆138Updated 4 years ago
frankwang0818 / AI_compiler_development_guide
Free resource for the book AI Compiler Development Guide
☆45Updated 2 years ago
MLIR-China / mlir-playground
Play with MLIR right in your browser
☆135Updated 2 years ago
XiaoSong9905 / HPC-Notes
Personal Notes for Learning HPC & Parallel Computation [Active Adding New Content]
☆67Updated 2 years ago
SJTU-ACA-Lab / blue-porcelain
☆145Updated last year
tfruan2000 / mlsys-study-note
My study note for mlsys
☆15Updated 7 months ago
interestingLSY / CUDA-From-Correctness-To-Performance-Code
Codes & examples for "CUDA - From Correctness to Performance"
☆100Updated 8 months ago
Cambricon / mlu-ops
Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .
☆123Updated last week
LeiWang1999 / tvm_gpu_gemm
play gemm with tvm
☆91Updated last year
pigirons / cpufp
A CPU tool for benchmarking the peak of floating points
☆548Updated last month
DD-DuDa / Cute-Learning
Examples of CUDA implementations by Cutlass CuTe
☆197Updated 4 months ago
AyakaGEMM / Hands-on-MLIR
☆17Updated last year
zjin-lcf / HeCBench
☆247Updated 2 weeks ago