inducer / loopyLinks
A code generator for array-based code on CPUs and GPUs
☆621Updated last week
Alternatives and similar repositories for loopy
Users that are interested in loopy are comparing it to the libraries listed below
Sorting:
- DaCe - Data Centric Parallel Programming☆572Updated this week
- Library for specialized dense and sparse matrix operations, and deep learning primitives.☆932Updated last week
- common in-memory tensor structure☆1,139Updated last month
- Kernel Tuner☆379Updated this week
- The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs☆1,344Updated 9 months ago
- Python SYCL bindings and SYCL-based Python Array API library☆121Updated this week
- The Foundation for All Legate Libraries☆233Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆253Updated 2 weeks ago
- Python wrapper for isl, an integer set library☆82Updated this week
- CLTune: An automatic OpenCL & CUDA kernel tuner☆183Updated 3 years ago
- ☆423Updated 2 weeks ago
- CUSP : A C++ Templated Sparse Matrix Library☆421Updated 5 months ago
- POC work on MLIR backend☆61Updated last year
- A Deep Learning Meta-Framework and HPC Benchmarking Library☆82Updated 3 years ago
- Symbolic Expression and Statement Module for new DSLs☆205Updated 5 years ago
- STREAM, for lots of devices written in many programming models☆354Updated 4 months ago
- GPUOCelot: A dynamic compilation framework for PTX☆289Updated 2 years ago
- This is a set of simple programs that can be used to explore the features of a parallel platform.☆471Updated 4 months ago
- The Legion Parallel Programming System☆751Updated last month
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆212Updated this week
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆260Updated last year
- Pluto: An automatic polyhedral parallelizer and locality optimizer☆316Updated 4 months ago
- A suite of benchmarks for CPU and GPU performance of the most popular high-performance libraries for Python☆333Updated last year
- Polyhedral Parallel Code Generation (source repository: http://repo.or.cz/ppcg.git)☆131Updated 3 years ago
- CUDA and OpenMP implementations of C2R/R2C inplace transposition☆48Updated 10 years ago
- Python bindings for UCX☆140Updated 4 months ago
- Code appendix to an OpenCL matrix-multiplication tutorial☆178Updated 8 years ago
- Python interface for MLIR - the Multi-Level Intermediate Representation☆273Updated last year
- OpenCL integration for Python, plus shiny features☆1,127Updated this week
- Assembler for NVIDIA Volta and Turing GPUs☆236Updated 4 years ago