dsharlet / slinkyLinks
Optimize pipelines for locality
☆9Updated last week
Alternatives and similar repositories for slinky
Users that are interested in slinky are comparing it to the libraries listed below
Sorting:
- Cuda matrix computation library that is specified for small matrix operation (3x3, 4x4, 1x3, 1x4, etc.). Including buffer☆19Updated last year
- Reference implementation of the draft C++ GraphBLAS specification.☆33Updated 4 months ago
- a compiler for re-writing image processing functions in C++ to Halide☆23Updated 2 years ago
- A simple, but fast, triangular solver☆17Updated 4 years ago
- Program Generator for Small-Scale Linear Algebra Applications☆29Updated 7 years ago
- ☆14Updated 2 years ago
- ☆31Updated 3 years ago
- variant type for CUDA☆12Updated 9 years ago
- Monte Carlo Render Viewing and Visualization Tools☆11Updated 4 years ago
- Experimental ranges for CUDA☆24Updated 6 years ago
- ☆23Updated 2 years ago
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆47Updated this week
- Atomistic Spin Simulation Framework☆66Updated 4 years ago
- FMM Template Library☆45Updated 7 years ago
- data-parallel out-of-core library☆50Updated last week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆55Updated 3 months ago
- Resources for the SIAMCSE21 minitutorial "Automatic Differentiation as a Tool for Computational Science"☆14Updated 4 years ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated 2 years ago
- a fork of clang with Sierra patches☆20Updated 6 years ago
- Skeletonide is a parallel implementation of Zhang-Suen morphological thinning algorithm written in Halide-lang. Use it for fast skeletoni…☆14Updated 4 years ago
- C++ library for graph ordering☆14Updated 5 years ago
- GTensor is a multi-dimensional array C++14 header-only library for hybrid GPU development.☆35Updated 2 months ago
- Tensor Contraction Code Generator☆37Updated 7 years ago
- CUDA and OpenMP implementations of C2R/R2C inplace transposition☆46Updated 10 years ago
- nimbus: a cloud computing framework for high performance computations☆25Updated 5 years ago
- CUDA Dynamic Memory Allocator for SOA Data Layout☆35Updated 3 years ago
- Multi-dimensional C++ arrays which store objects in a Struct-of-Arrays (SoA) memory layout for efficient vectorization and zero address g…☆74Updated 4 years ago
- Vectorization EDSL library☆15Updated 6 years ago
- A nanobind example project☆107Updated 2 months ago
- tokenizer and parser for circle projects☆11Updated 5 years ago