KAdamek / SMFFT
fast Fourier transform on GPU in shared memory for AstroAccelerate project
☆26Updated 4 years ago
Alternatives and similar repositories for SMFFT:
Users that are interested in SMFFT are comparing it to the libraries listed below
- Shared memory overlap-and-save method for NVIDIA GPUs using CUDA☆16Updated 2 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆68Updated 2 years ago
- sparse matrix pre-processing library☆81Updated last year
- Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support☆53Updated 2 months ago
- A Massively Parallel FFT Library for CPU/GPU☆56Updated 4 years ago
- a tester for BLAS libraries including OpenBLAS and Intel MKL. This project is based on ATLAS BLAS Tester☆34Updated 2 years ago
- GTensor is a multi-dimensional array C++14 header-only library for hybrid GPU development.☆36Updated last month
- PLASMA is a software package for solving problems in dense linear algebra using OpenMP☆29Updated 2 weeks ago
- Benchmarking OpenBLAS on the Apple M1☆18Updated 4 years ago
- Multi-dimensional array programming framework for C++ and multi-GPU CUDA applications☆28Updated 8 years ago
- ☆29Updated 2 weeks ago
- MagmaDNN: a simple deep learning framework in c++☆49Updated 4 years ago
- cuASR: CUDA Algebra for Semirings☆35Updated 2 years ago
- A C++ allocator based on cudaMallocManaged☆23Updated 6 years ago
- Next generation library for iterative sparse solvers for ROCm platform☆81Updated last week
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- Recursive LAPACK Collection☆42Updated 3 years ago
- Yaksa: High-performance Noncontiguous Data Management☆13Updated 7 months ago
- A hierarchical matrix C/C++ library☆23Updated last week
- Distributed-memory, arbitrary-precision, dense and sparse-direct linear algebra, conic optimization, and lattice reduction☆68Updated last month
- C++ library for graph ordering☆14Updated 5 years ago
- CUDA-based implementation for linear 1D, 2D and 3D FFT-Shift functions.☆22Updated 9 years ago
- Parallel Tensor Infrastructure (ParTI!)☆28Updated 4 years ago
- The fftMPI library performs 2d/3d FFTs in parallel for grids distributed across MPI processes.☆14Updated 2 years ago
- ☆40Updated 3 years ago
- Reference implementation of the draft C++ GraphBLAS specification.☆32Updated 2 months ago
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆37Updated 7 years ago
- a heterogeneous multiGPU level-3 BLAS library☆45Updated 5 years ago
- The Surprisingly ParalleL spArse Tensor Toolkit.☆71Updated 3 years ago
- High-performance, GPU-aware communication library☆85Updated 3 months ago