inducer / loopy
A code generator for array-based code on CPUs and GPUs
☆599Updated this week
Alternatives and similar repositories for loopy:
Users that are interested in loopy are comparing it to the libraries listed below
- common in-memory tensor structure☆963Updated last week
- ☆408Updated this week
- The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs☆1,287Updated 11 months ago
- Library for specialized dense and sparse matrix operations, and deep learning primitives.☆867Updated this week
- CLTune: An automatic OpenCL & CUDA kernel tuner☆177Updated 2 years ago
- Stretching GPU performance for GEMMs and tensor contractions.☆233Updated this week
- CUDA Kernel Benchmarking Library☆593Updated last week
- CUSP : A C++ Templated Sparse Matrix Library☆411Updated 4 months ago
- Kernel Tuner☆325Updated this week
- The Foundation for All Legate Libraries☆206Updated this week
- Automatic parallelization of Python/NumPy, C, and C++ codes on Linux and MacOSX☆220Updated 4 years ago
- A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).☆527Updated last week
- ☆233Updated 2 years ago
- Pluto: An automatic polyhedral parallelizer and locality optimizer☆283Updated this week
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆358Updated last week
- Symbolic Expression and Statement Module for new DSLs☆205Updated 4 years ago
- This is a set of simple programs that can be used to explore the features of a parallel platform.☆423Updated last week
- STREAM, for lots of devices written in many programming models☆329Updated 6 months ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆129Updated last year
- Python interface for MLIR - the Multi-Level Intermediate Representation☆246Updated 3 months ago
- Python wrapper for isl, an integer set library☆76Updated this week
- DaCe - Data Centric Parallel Programming☆515Updated this week
- Backward compatible ML compute opset inspired by HLO/MHLO☆457Updated this week
- Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure☆830Updated this week
- Tuned OpenCL BLAS☆1,090Updated 4 months ago
- Open single and half precision gemm implementations☆378Updated last year
- Assembler for NVIDIA Volta and Turing GPUs☆214Updated 3 years ago
- The Tensor Algebra SuperOptimizer for Deep Learning☆704Updated 2 years ago
- GPUOCelot: A dynamic compilation framework for PTX☆286Updated last year
- [ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl☆1,735Updated last year