inducer / loopyLinks
A code generator for array-based code on CPUs and GPUs
☆613Updated last week
Alternatives and similar repositories for loopy
Users that are interested in loopy are comparing it to the libraries listed below
Sorting:
- DaCe - Data Centric Parallel Programming☆547Updated this week
- Kernel Tuner☆357Updated last week
- The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs☆1,317Updated 4 months ago
- Library for specialized dense and sparse matrix operations, and deep learning primitives.☆891Updated this week
- The Foundation for All Legate Libraries☆221Updated this week
- common in-memory tensor structure☆1,046Updated 2 months ago
- Python wrapper for isl, an integer set library☆77Updated last week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆247Updated last week
- ☆419Updated last week
- Python interface for MLIR - the Multi-Level Intermediate Representation☆264Updated 8 months ago
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆208Updated 3 months ago
- The Legion Parallel Programming System☆737Updated last month
- CUSP : A C++ Templated Sparse Matrix Library☆415Updated 2 weeks ago
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆440Updated this week
- Pluto: An automatic polyhedral parallelizer and locality optimizer☆304Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆382Updated last week
- A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)☆414Updated 7 months ago
- POC work on MLIR backend☆58Updated last year
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆262Updated 7 months ago
- CLTune: An automatic OpenCL & CUDA kernel tuner☆181Updated 2 years ago
- Symbolic Expression and Statement Module for new DSLs☆206Updated 4 years ago
- Automatic parallelization of Python/NumPy, C, and C++ codes on Linux and MacOSX☆221Updated 4 years ago
- STREAM, for lots of devices written in many programming models☆347Updated 11 months ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆134Updated last year
- Python SYCL bindings and SYCL-based Python Array API library☆116Updated this week
- A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).☆550Updated 2 weeks ago
- Assembler for NVIDIA Volta and Turing GPUs☆229Updated 3 years ago
- CUDA Kernel Benchmarking Library☆701Updated last week
- BLAS-like Library Instantiation Software Framework☆145Updated last week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆348Updated this week