inducer / loopyLinks
A code generator for array-based code on CPUs and GPUs
☆604Updated last week
Alternatives and similar repositories for loopy
Users that are interested in loopy are comparing it to the libraries listed below
Sorting:
- Library for specialized dense and sparse matrix operations, and deep learning primitives.☆875Updated this week
- The Foundation for All Legate Libraries☆217Updated this week
- Stretching GPU performance for GEMMs and tensor contractions.☆241Updated this week
- Python interface for MLIR - the Multi-Level Intermediate Representation☆257Updated 6 months ago
- The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs☆1,301Updated last month
- common in-memory tensor structure☆1,002Updated 3 weeks ago
- CUSP : A C++ Templated Sparse Matrix Library☆412Updated 2 weeks ago
- Kernel Tuner☆337Updated this week
- DaCe - Data Centric Parallel Programming☆534Updated this week
- A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).☆536Updated this week
- Symbolic Expression and Statement Module for new DSLs☆205Updated 4 years ago
- [ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl☆1,747Updated last year
- ☆415Updated last week
- Pluto: An automatic polyhedral parallelizer and locality optimizer☆291Updated 2 months ago
- Distributed multigrid linear solver library on GPU☆564Updated 3 months ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆502Updated 2 years ago
- Assembler for NVIDIA Volta and Turing GPUs☆218Updated 3 years ago
- Python wrapper for isl, an integer set library☆77Updated this week
- ☆240Updated 2 years ago
- The Legion Parallel Programming System☆725Updated 2 months ago
- Assembler for NVIDIA Maxwell architecture☆1,002Updated 2 years ago
- CUDA Kernel Benchmarking Library☆650Updated this week
- GPUOCelot: A dynamic compilation framework for PTX☆287Updated last year
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆205Updated 3 weeks ago
- This is a set of simple programs that can be used to explore the features of a parallel platform.☆432Updated this week
- ☆538Updated this week
- CLTune: An automatic OpenCL & CUDA kernel tuner☆178Updated 2 years ago
- A suite of benchmarks for CPU and GPU performance of the most popular high-performance libraries for Python☆325Updated 7 months ago
- STREAM, for lots of devices written in many programming models☆339Updated 9 months ago
- Examples demonstrating available options to program multiple GPUs in a single node or a cluster☆715Updated 3 months ago