bryancatanzaro / inplace
CUDA and OpenMP implementations of C2R/R2C inplace transposition
☆46Updated 10 years ago
Alternatives and similar repositories for inplace:
Users that are interested in inplace are comparing it to the libraries listed below
- Full-speed Array of Structures access☆169Updated 2 years ago
- sparse matrix pre-processing library☆81Updated last year
- High-performance, GPU-aware communication library☆85Updated 3 months ago
- a heterogeneous multiGPU level-3 BLAS library☆45Updated 5 years ago
- Tensor Contraction Code Generator☆37Updated 7 years ago
- A task benchmark☆42Updated 9 months ago
- CUDA Tensor Transpose (cuTT) library☆51Updated 7 years ago
- A fast and highly scalable GPU dynamic memory allocator☆104Updated 10 years ago
- A unified framework across multiple programming platforms☆37Updated 10 months ago
- Autonomic Performance Environment for eXascale (APEX)☆47Updated this week
- Cyclops Tensor Framework: parallel arithmetic on multidimensional arrays☆203Updated 9 months ago
- A library to benchmark CUDA code, similar to google benchmark.☆28Updated 4 years ago
- High-Performance Tensor Transpose library☆195Updated last year
- Use CUDA intrinsics with user-defined types☆47Updated 10 years ago
- cuASR: CUDA Algebra for Semirings☆35Updated 2 years ago
- Fork of magma to include more BLAS☆28Updated 8 years ago
- ulmBLAS☆106Updated 3 years ago
- Sparse matrix computation library for GPU☆56Updated 4 years ago
- ☆70Updated 4 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- A framework that helps implementing swizzle GPU kernels☆41Updated 5 years ago
- ☆29Updated last week
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆93Updated 3 years ago
- Compute applications.☆24Updated 5 years ago
- TTC: A high-performance Compiler for Tensor Transpositions☆20Updated 7 years ago
- GTensor is a multi-dimensional array C++14 header-only library for hybrid GPU development.☆36Updated last month
- Kernel Tuning Toolkit☆59Updated last month
- Implementation of AMD HIP for CPUs☆22Updated 4 years ago
- portDNN is a library implementing neural network algorithms written using SYCL☆113Updated 11 months ago
- Multi-dimensional array programming framework for C++ and multi-GPU CUDA applications☆28Updated 8 years ago