bryancatanzaro / inplaceLinks
CUDA and OpenMP implementations of C2R/R2C inplace transposition
☆46Updated 10 years ago
Alternatives and similar repositories for inplace
Users that are interested in inplace are comparing it to the libraries listed below
Sorting:
- sparse matrix pre-processing library☆82Updated last year
- High-performance, GPU-aware communication library☆85Updated 4 months ago
- Full-speed Array of Structures access☆169Updated 2 years ago
- a heterogeneous multiGPU level-3 BLAS library☆45Updated 5 years ago
- Tensor Contraction Code Generator☆37Updated 7 years ago
- A fast and highly scalable GPU dynamic memory allocator☆104Updated 10 years ago
- CUDA Tensor Transpose (cuTT) library☆51Updated 7 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆68Updated 2 years ago
- High-Performance Tensor Transpose library☆195Updated 2 years ago
- Sparse matrix computation library for GPU☆56Updated 4 years ago
- The SparseX sparse kernel optimization library☆39Updated 6 years ago
- TTC: A high-performance Compiler for Tensor Transpositions☆20Updated 7 years ago
- Multi-dimensional array programming framework for C++ and multi-GPU CUDA applications☆28Updated 8 years ago
- Fork of magma to include more BLAS☆28Updated 8 years ago
- Cyclops Tensor Framework: parallel arithmetic on multidimensional arrays☆205Updated 9 months ago
- Distributed Communication-Optimal LU-factorization Algorithm☆12Updated 3 years ago
- A task benchmark☆42Updated 9 months ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆105Updated 7 years ago
- Autonomic Performance Environment for eXascale (APEX)☆48Updated 2 weeks ago
- A unified framework across multiple programming platforms☆38Updated 11 months ago
- Kernel Tuning Toolkit☆59Updated 2 weeks ago
- Parallel Tensor Infrastructure (ParTI!)☆28Updated 4 years ago
- ulmBLAS☆106Updated 3 years ago
- Comb is a communication performance benchmarking tool.☆25Updated 2 years ago
- Reference implementation of the draft C++ GraphBLAS specification.☆33Updated 3 months ago
- ☆91Updated 8 years ago
- mallocMC: Memory Allocator for Many Core Architectures☆55Updated 3 weeks ago
- The Combinatorial BLAS (CombBLAS) is an extensible distributed-memory parallel graph library offering a small but powerful set of linear …☆75Updated this week
- A library to benchmark CUDA code, similar to google benchmark.☆28Updated 4 years ago
- GTensor is a multi-dimensional array C++14 header-only library for hybrid GPU development.☆35Updated last month