scalable-analyses / sme
☆17Updated 3 months ago
Alternatives and similar repositories for sme:
Users that are interested in sme are comparing it to the libraries listed below
- Example for running IREE in a bare-metal Arm environment.☆25Updated last week
- A framework that support executing unmodified CUDA source code on non-NVIDIA devices.☆112Updated 2 weeks ago
- Running linear algebra as fast as possible on Apple silicon☆18Updated last year
- An HPL-AI implementation for Fugaku☆19Updated 3 years ago
- ☆38Updated this week
- Bridging polyhedral analysis tools to the MLIR framework☆107Updated last year
- A lightweight, Pythonic, frontend for MLIR☆80Updated last year
- ☆28Updated 2 years ago
- The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.☆36Updated 3 years ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆54Updated 4 months ago
- ☆30Updated 2 years ago
- Custom-Precision Floating-point numbers.☆29Updated last week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆47Updated last year
- BLIS fork with kernels for Apple M1. (Perhaps) The first open-source BLAS with Apple Matrix Coprocessor support.☆33Updated 2 years ago
- HeteroCL-MLIR dialect for accelerator design☆41Updated 4 months ago
- compiling DSLs to high-level hardware instructions☆22Updated 2 years ago
- Data-Centric MLIR dialect☆40Updated last year
- Reference implementation of Deep Neural Network primitives using LIBXSMM's Tensor Processing Primitives (TPP)☆12Updated 5 months ago
- The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github…☆32Updated last month
- ☆20Updated 3 years ago
- MLIRX is now defunct. Please see PolyBlocks - https://docs.polymagelabs.com☆38Updated last year
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆127Updated last year
- MLIR tools and dialect for GraphBLAS☆18Updated 2 years ago
- Benchmark for measuring the performance of sparse and irregular memory access.☆76Updated this week
- Provides a set of benchmarks that can be used to measure the memory bandwidth performance of CPU's☆82Updated 9 months ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆71Updated last year
- GPUOcelot: A dynamic compilation framework for PTX☆157Updated 3 weeks ago
- Conversions to MLIR EmitC☆126Updated last month
- HeteroSync is a benchmark suite for performing fine-grained synchronization on tightly coupled GPUs☆27Updated 4 months ago
- ☆86Updated this week