intel / Deep-learning-math-kernel-research
☆30Updated 2 years ago
Alternatives and similar repositories for Deep-learning-math-kernel-research:
Users that are interested in Deep-learning-math-kernel-research are comparing it to the libraries listed below
- Assembler for NVIDIA Volta and Turing GPUs☆214Updated 3 years ago
- ☆60Updated 2 months ago
- heterogeneity-aware-lowering-and-optimization☆254Updated last year
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆128Updated last year
- ☆194Updated last year
- OpenAI Triton backend for Intel® GPUs☆165Updated this week
- Dissecting NVIDIA GPU Architecture☆88Updated 2 years ago
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆32Updated 4 years ago
- Subpart source code of of deepcore v0.7☆27Updated 4 years ago
- A home for the final text of all TVM RFCs.☆102Updated 5 months ago
- ☆137Updated this week
- ☆233Updated 2 years ago
- Development repository for the Triton-Linalg conversion☆176Updated 3 weeks ago
- Implement asm gemm on vega64 for 4096x4096 fp32 matrix☆22Updated 5 years ago
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆80Updated 5 years ago
- MLIRX is now defunct. Please see PolyBlocks - https://docs.polymagelabs.com☆38Updated last year
- ☆88Updated 10 months ago
- Shared Middle-Layer for Triton Compilation☆228Updated this week
- CUDA Tensor Transpose (cuTT) library☆51Updated 7 years ago
- 14 basic topics for VEGA64 performance optmization☆53Updated 3 years ago
- Kernel Fusion and Runtime Compilation Based on NNVM☆70Updated 8 years ago
- tophub autotvm log collections☆70Updated 2 years ago
- Experimental projects related to TensorRT☆89Updated this week
- A simple tool to profile performance of multiple combinations of GEMM of cuBLAS☆25Updated 4 years ago
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆130Updated this week
- ☆43Updated 4 years ago
- ☆406Updated this week
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆68Updated 5 years ago
- collection of benchmarks to measure basic GPU capabilities☆302Updated 2 weeks ago
- An extension library of WMMA API (Tensor Core API)☆90Updated 7 months ago