inducer / loopy
A code generator for array-based code on CPUs and GPUs
☆576Updated last week
Related projects: ⓘ
- common in-memory tensor structure☆890Updated last week
- Library for specialized dense and sparse matrix operations, and deep learning primitives.☆839Updated this week
- DaCe - Data Centric Parallel Programming☆490Updated this week
- The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs☆1,230Updated 5 months ago
- Stretching GPU performance for GEMMs and tensor contractions.☆213Updated this week
- Kernel Tuner☆273Updated this week
- The Foundation for All Legate Libraries☆186Updated last week
- A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).☆513Updated 3 months ago
- CUDA Kernel Benchmarking Library☆482Updated 3 months ago
- CUSP : A C++ Templated Sparse Matrix Library☆400Updated 8 months ago
- The Legion Parallel Programming System☆675Updated last week
- Automatic parallelization of Python/NumPy, C, and C++ codes on Linux and MacOSX☆219Updated 3 years ago
- Python interface for MLIR - the Multi-Level Intermediate Representation☆210Updated 3 months ago
- CLTune: An automatic OpenCL & CUDA kernel tuner☆167Updated last year
- ☆392Updated this week
- Examples demonstrating available options to program multiple GPUs in a single node or a cluster☆528Updated last month
- Assembler for NVIDIA Maxwell architecture☆942Updated last year
- Pluto: An automatic polyhedral parallelizer and locality optimizer☆267Updated 4 months ago
- Portable and vendor neutral framework for parallel programming on heterogeneous platforms.☆386Updated last month
- An implementation of BLAS using the SYCL open standard.☆250Updated 2 weeks ago
- Assembler for NVIDIA Volta and Turing GPUs☆195Updated 2 years ago
- Next generation BLAS implementation for ROCm platform☆341Updated this week
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆279Updated last week
- Python wrapper for isl, an integer set library☆73Updated last week
- ☆226Updated last year
- An Aspiring Drop-In Replacement for NumPy at Scale☆610Updated last week
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆389Updated last year
- A suite of benchmarks for CPU and GPU performance of the most popular high-performance libraries for Python☆308Updated 8 months ago
- oneAPI Math Kernel Library (oneMKL) Interfaces☆606Updated last week
- ☆465Updated this week