MooreThreads / mutlassLinks
MUSA Templates for Linear Algebra Subroutines
☆37Updated 9 months ago
Alternatives and similar repositories for mutlass
Users that are interested in mutlass are comparing it to the libraries listed below
Sorting:
- This is an implementation of sgemm_kernel on L1d cache.☆233Updated last year
- Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.☆94Updated 2 years ago
- ☆156Updated 11 months ago
- A CPU tool for benchmarking the peak of floating points☆569Updated this week
- a tensor computing compiler based tile programming for gpu, cpu or tpu☆45Updated 3 months ago
- 先进编译实验室的个人主页☆178Updated 2 months ago
- ☆274Updated last month
- CUDA PTX-ISA Document 中文翻译版☆47Updated 2 months ago
- Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .☆143Updated last week
- ☆116Updated last year
- 作为对《Heterogeneous Computing with OpenCL 2.0》英文版的中文翻译。☆140Updated 5 years ago
- Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.☆158Updated 3 years ago
- FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang…☆146Updated this week
- 大规模并行处理器编程实战 第二版答案☆33Updated 3 years ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆84Updated 2 years ago
- Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.☆397Updated 11 months ago
- ☆26Updated 4 months ago
- Free resource for the book AI Compiler Development Guide☆49Updated 2 years ago
- row-major matmul optimization☆692Updated 4 months ago
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …☆190Updated 10 months ago
- Dissecting NVIDIA GPU Architecture☆115Updated 3 years ago
- Machine learning compiler based on MLIR for Sophgo TPU.☆829Updated last week
- A model compilation solution for various hardware☆457Updated 4 months ago
- An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).☆670Updated last week
- Hands-On Practical MLIR Tutorial☆46Updated 4 months ago
- mperf是一个面向移动/嵌入式平台的算子性能调优工具箱☆192Updated 2 years ago
- A simple high performance CUDA GEMM implementation.☆421Updated last year
- ☆255Updated 2 years ago
- hands on model tuning with TVM and profile it on a Mac M1, x86 CPU, and GTX-1080 GPU.☆49Updated 2 years ago
- torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics c…☆456Updated last month