bryancatanzaro / inplace
CUDA and OpenMP implementations of C2R/R2C inplace transposition
☆45Updated 9 years ago
Related projects ⓘ
Alternatives and complementary repositories for inplace
- sparse matrix pre-processing library☆81Updated 6 months ago
- Full-speed Array of Structures access☆161Updated last year
- High-Performance Tensor Transpose library☆185Updated last year
- CUDA Tensor Transpose (cuTT) library☆50Updated 7 years ago
- Tensor Contraction Code Generator☆36Updated 7 years ago
- High-performance, GPU-aware communication library☆84Updated 3 weeks ago
- a heterogeneous multiGPU level-3 BLAS library☆45Updated 4 years ago
- A C++ allocator based on cudaMallocManaged☆23Updated 6 years ago
- The SparseX sparse kernel optimization library☆39Updated 5 years ago
- TTC: A high-performance Compiler for Tensor Transpositions☆20Updated 7 years ago
- ulmBLAS☆104Updated 2 years ago
- Cyclops Tensor Framework: parallel arithmetic on multidimensional arrays☆201Updated 3 months ago
- Experimental Linear Algebra Performance Studies☆12Updated 7 years ago
- A library to benchmark CUDA code, similar to google benchmark.☆28Updated 3 years ago
- Multi-dimensional array programming framework for C++ and multi-GPU CUDA applications☆28Updated 7 years ago
- Autonomic Performance Environment for eXascale (APEX)☆38Updated 3 weeks ago
- Sparse matrix computation library for GPU☆54Updated 4 years ago
- Use CUDA intrinsics with user-defined types☆47Updated 10 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆99Updated 7 years ago
- ☆90Updated 7 years ago
- A task benchmark☆40Updated 3 months ago
- Fork of magma to include more BLAS☆28Updated 7 years ago
- A fast and highly scalable GPU dynamic memory allocator☆103Updated 9 years ago
- Tensor Contraction C++ Library☆50Updated 5 years ago
- QMCPACK miniapp: a simplified real space QMC code for algorithm development, performance portability testing, and computer science experi…☆27Updated 3 months ago
- Range-based for loops to iterate over a range of numbers or values☆35Updated 7 years ago
- cuASR: CUDA Algebra for Semirings☆34Updated 2 years ago
- NPBench - A Benchmarking Suite for High-Performance NumPy☆73Updated this week
- The Surprisingly ParalleL spArse Tensor Toolkit.☆69Updated 2 years ago
- A Sound and Complete Verification Tool for Warp-Specialized GPU Kernels☆18Updated 9 years ago