inducer / loopyLinks
A code generator for array-based code on CPUs and GPUs
☆606Updated this week
Alternatives and similar repositories for loopy
Users that are interested in loopy are comparing it to the libraries listed below
Sorting:
- The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs☆1,304Updated 2 months ago
- Library for specialized dense and sparse matrix operations, and deep learning primitives.☆879Updated 2 weeks ago
- common in-memory tensor structure☆1,014Updated last week
- The Foundation for All Legate Libraries☆218Updated this week
- ☆416Updated this week
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆206Updated last month
- CUDA Kernel Benchmarking Library☆666Updated last week
- Kernel Tuner☆344Updated this week
- Automatic parallelization of Python/NumPy, C, and C++ codes on Linux and MacOSX☆220Updated 4 years ago
- CUSP : A C++ Templated Sparse Matrix Library☆413Updated last week
- DaCe - Data Centric Parallel Programming☆535Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆245Updated this week
- Python interface for MLIR - the Multi-Level Intermediate Representation☆259Updated 6 months ago
- Python wrapper for isl, an integer set library☆77Updated this week
- GPUOCelot: A dynamic compilation framework for PTX☆287Updated last year
- Pluto: An automatic polyhedral parallelizer and locality optimizer☆294Updated last week
- The Legion Parallel Programming System☆726Updated last week
- A suite of benchmarks for CPU and GPU performance of the most popular high-performance libraries for Python☆326Updated 8 months ago
- STREAM, for lots of devices written in many programming models☆343Updated 9 months ago
- POC work on MLIR backend☆55Updated 10 months ago
- A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).☆543Updated this week
- Python bindings for UCX☆135Updated this week
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆396Updated 3 weeks ago
- Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure☆873Updated last week
- [ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl☆1,755Updated last year
- Examples demonstrating available options to program multiple GPUs in a single node or a cluster☆737Updated 4 months ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆133Updated last year
- ☆241Updated 2 years ago
- A Python Compiler Design Toolkit☆366Updated this week
- A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)☆404Updated 5 months ago