inducer / loopy
A code generator for array-based code on CPUs and GPUs
☆602Updated last week
Alternatives and similar repositories for loopy:
Users that are interested in loopy are comparing it to the libraries listed below
- common in-memory tensor structure☆974Updated this week
- Library for specialized dense and sparse matrix operations, and deep learning primitives.☆866Updated this week
- The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs☆1,293Updated last year
- The Foundation for All Legate Libraries☆212Updated this week
- Automatic parallelization of Python/NumPy, C, and C++ codes on Linux and MacOSX☆220Updated 4 years ago
- Python interface for MLIR - the Multi-Level Intermediate Representation☆248Updated 4 months ago
- Stretching GPU performance for GEMMs and tensor contractions.☆235Updated this week
- Kernel Tuner☆326Updated this week
- A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).☆533Updated last month
- ☆409Updated this week
- Python wrapper for isl, an integer set library☆77Updated this week
- CUSP : A C++ Templated Sparse Matrix Library☆412Updated 5 months ago
- Pluto: An automatic polyhedral parallelizer and locality optimizer☆288Updated 2 weeks ago
- Next generation BLAS implementation for ROCm platform☆362Updated this week
- CLTune: An automatic OpenCL & CUDA kernel tuner☆177Updated 2 years ago
- This is a set of simple programs that can be used to explore the features of a parallel platform.☆427Updated 2 weeks ago
- A suite of benchmarks for CPU and GPU performance of the most popular high-performance libraries for Python☆320Updated 6 months ago
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆261Updated 3 months ago
- Tuned OpenCL BLAS☆1,094Updated 5 months ago
- a software library containing BLAS functions written in OpenCL☆853Updated 8 months ago
- Backward compatible ML compute opset inspired by HLO/MHLO☆465Updated this week
- CUDA Kernel Benchmarking Library☆618Updated this week
- Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure☆840Updated this week
- Example python (numpy) -- CUDA installable package with a C-extension library☆143Updated 5 years ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆130Updated last year
- DaCe - Data Centric Parallel Programming☆522Updated this week
- Developer repository for ViennaCL. Visit http://viennacl.sourceforge.net/ for the latest releases.☆286Updated 3 years ago
- Distributed multigrid linear solver library on GPU☆550Updated 2 months ago
- The Tensor Algebra SuperOptimizer for Deep Learning☆705Updated 2 years ago
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆205Updated last week