KAdamek / SMFFT
fast Fourier transform on GPU in shared memory for AstroAccelerate project
☆26Updated 4 years ago
Alternatives and similar repositories for SMFFT:
Users that are interested in SMFFT are comparing it to the libraries listed below
- Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support☆48Updated last week
- A C++ allocator based on cudaMallocManaged☆23Updated 6 years ago
- Kernel Tuning Toolkit☆55Updated 2 months ago
- A GPU performance prediction toolkit for CUDA programs☆16Updated 5 years ago
- FFTX Project☆20Updated last month
- Autonomic Performance Environment for eXascale (APEX)☆42Updated this week
- High-Performance Machine Learning Primitives☆11Updated 3 years ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- PLASMA is a software package for solving problems in dense linear algebra using OpenMP☆25Updated last week
- C++ Header-Only Library for High-Performance Tensor-Vector Multiplication☆21Updated last month
- sparse matrix pre-processing library☆81Updated 8 months ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆67Updated last year
- Multi-dimensional array programming framework for C++ and multi-GPU CUDA applications☆28Updated 8 years ago
- Error-Free Transformations as building blocks for compensated algorithms☆14Updated last year
- Shared memory overlap-and-save method for NVIDIA GPUs using CUDA☆16Updated 2 years ago
- WIP · CUDA compatibility for Blaze · https://bitbucket.org/blaze-lib/blaze☆17Updated 5 years ago
- The SparseX sparse kernel optimization library☆39Updated 6 years ago
- HiCMA: Hierarchical Computations on Manycore Architectures☆30Updated last year
- MagmaDNN: a simple deep learning framework in c++☆48Updated 4 years ago
- A Massively Parallel FFT Library for CPU/GPU☆54Updated 4 years ago
- Julia ports of the Rodinia benchmark suite for heterogeneous computing infrastructures☆49Updated last year
- Department of Energy Standard Utility Library☆30Updated 4 months ago
- MATLAB Code for Parameters of Floating-Point Arithmetics☆9Updated 2 years ago
- A proxy app for the Monte Carlo Transport Code, Mercury. LLNL-CODE-684037☆38Updated 11 months ago
- Implementation of AMD HIP for CPUs☆22Updated 4 years ago
- An OpenMP runtime implemented using HPX☆23Updated 2 years ago
- Next generation library for iterative sparse solvers for ROCm platform☆79Updated this week
- Tensor Contraction Code Generator☆36Updated 7 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆47Updated last year