inducer / loopyLinks
A code generator for array-based code on CPUs and GPUs
☆624Updated last week
Alternatives and similar repositories for loopy
Users that are interested in loopy are comparing it to the libraries listed below
Sorting:
- The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs☆1,348Updated 9 months ago
- The Foundation for All Legate Libraries☆233Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆254Updated last week
- DaCe - Data Centric Parallel Programming☆573Updated this week
- Library for specialized dense and sparse matrix operations, and deep learning primitives.☆933Updated 3 weeks ago
- Kernel Tuner☆381Updated last week
- common in-memory tensor structure☆1,161Updated last week
- ☆422Updated last month
- Python wrapper for isl, an integer set library☆83Updated last week
- Pluto: An automatic polyhedral parallelizer and locality optimizer☆322Updated 5 months ago
- Python interface for MLIR - the Multi-Level Intermediate Representation☆272Updated last year
- POC work on MLIR backend☆61Updated last year
- Symbolic Expression and Statement Module for new DSLs☆205Updated 5 years ago
- Automatic parallelization of Python/NumPy, C, and C++ codes on Linux and MacOSX☆222Updated 5 years ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆138Updated 2 years ago
- CLTune: An automatic OpenCL & CUDA kernel tuner☆184Updated 3 years ago
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆212Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆390Updated last week
- ☆250Updated 6 months ago
- A suite of benchmarks for CPU and GPU performance of the most popular high-performance libraries for Python☆334Updated last year
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆260Updated last year
- NPBench - A Benchmarking Suite for High-Performance NumPy☆91Updated last week
- Assembler for NVIDIA Volta and Turing GPUs☆239Updated 4 years ago
- STREAM, for lots of devices written in many programming models☆355Updated 5 months ago
- Backward compatible ML compute opset inspired by HLO/MHLO☆601Updated 3 weeks ago
- A Deep Learning Meta-Framework and HPC Benchmarking Library☆82Updated 3 years ago
- Python SYCL bindings and SYCL-based Python Array API library☆121Updated last week
- The Tensor Algebra SuperOptimizer for Deep Learning☆740Updated 3 years ago
- The Legion Parallel Programming System☆750Updated last month
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆380Updated this week