inducer / loopyLinks
A code generator for array-based code on CPUs and GPUs
☆615Updated last week
Alternatives and similar repositories for loopy
Users that are interested in loopy are comparing it to the libraries listed below
Sorting:
- common in-memory tensor structure☆1,087Updated 2 weeks ago
- The Foundation for All Legate Libraries☆229Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆253Updated 2 weeks ago
- The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs☆1,328Updated 6 months ago
- DaCe - Data Centric Parallel Programming☆558Updated this week
- Library for specialized dense and sparse matrix operations, and deep learning primitives.☆914Updated 2 weeks ago
- Python SYCL bindings and SYCL-based Python Array API library☆117Updated this week
- Python wrapper for isl, an integer set library☆78Updated 2 weeks ago
- ☆422Updated last week
- Kernel Tuner☆370Updated this week
- CLTune: An automatic OpenCL & CUDA kernel tuner☆182Updated 2 years ago
- The Legion Parallel Programming System☆747Updated 3 weeks ago
- Symbolic Expression and Statement Module for new DSLs☆205Updated 5 years ago
- POC work on MLIR backend☆60Updated last year
- CUSP : A C++ Templated Sparse Matrix Library☆417Updated 2 months ago
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆209Updated last week
- A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).☆563Updated last month
- NPBench - A Benchmarking Suite for High-Performance NumPy☆89Updated 3 weeks ago
- A suite of benchmarks for CPU and GPU performance of the most popular high-performance libraries for Python☆332Updated last year
- A Deep Learning Meta-Framework and HPC Benchmarking Library☆81Updated 3 years ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆388Updated last week
- Automatic parallelization of Python/NumPy, C, and C++ codes on Linux and MacOSX☆222Updated 4 years ago
- Pluto: An automatic polyhedral parallelizer and locality optimizer☆307Updated 2 months ago
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆261Updated 9 months ago
- GPUOCelot: A dynamic compilation framework for PTX☆289Updated 2 years ago
- Assembler for NVIDIA Volta and Turing GPUs☆231Updated 3 years ago
- STREAM, for lots of devices written in many programming models☆350Updated last month
- Polyhedral Parallel Code Generation (source repository: http://repo.or.cz/ppcg.git)☆131Updated 3 years ago
- A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)☆425Updated 9 months ago
- Python bindings for UCX☆140Updated last month