KAdamek / SMFFT
fast Fourier transform on GPU in shared memory for AstroAccelerate project
☆26Updated 4 years ago
Alternatives and similar repositories for SMFFT:
Users that are interested in SMFFT are comparing it to the libraries listed below
- sparse matrix pre-processing library☆81Updated 9 months ago
- QCD for Intel Xeon Phi and Xeon processors☆14Updated 11 months ago
- A Massively Parallel FFT Library for CPU/GPU☆55Updated 4 years ago
- CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.☆59Updated 2 years ago
- ☆37Updated 3 years ago
- Shared memory overlap-and-save method for NVIDIA GPUs using CUDA☆16Updated 2 years ago
- OpenMPL (Open Math Performance Library) is an open source math libraries, including BLAS, LAPACK, FFT, VML, and others.☆18Updated last year
- Codeplay project for contributions to the LLVM SYCL implementation☆30Updated 4 years ago
- CUDA-based implementation for linear 1D, 2D and 3D FFT-Shift functions.☆22Updated 9 years ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- Autonomic Performance Environment for eXascale (APEX)☆43Updated last week
- Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support☆50Updated last month
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated last year
- The SparseX sparse kernel optimization library☆39Updated 6 years ago
- A C++ allocator based on cudaMallocManaged☆23Updated 6 years ago
- PLASMA is a software package for solving problems in dense linear algebra using OpenMP☆26Updated 3 weeks ago
- Multi-dimensional array programming framework for C++ and multi-GPU CUDA applications☆28Updated 8 years ago
- Next generation library for iterative sparse solvers for ROCm platform☆78Updated this week
- bhSPARSE: A Sparse BLAS Library☆16Updated 9 years ago
- A GPU-based LZSS compression algorithm, highly tuned for NVIDIA GPGPUs and for streaming data, leveraging the respective strengths of CPU…☆35Updated 9 years ago
- MagmaDNN: a simple deep learning framework in c++☆49Updated 4 years ago
- tools to create performance and roofline plots from measured data☆58Updated 10 years ago
- BLAS implementation for Intel FPGA☆76Updated 4 years ago
- HiCMA: Hierarchical Computations on Manycore Architectures☆30Updated last year
- C++ Header-Only Library for High-Performance Tensor-Vector Multiplication☆21Updated 2 months ago
- Distributed View Extension for Kokkos☆44Updated 2 months ago
- Benchmarking OpenBLAS on the Apple M1☆18Updated 4 years ago
- Fork of magma to include more BLAS☆28Updated 8 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆68Updated last year
- Kernel Tuning Toolkit☆57Updated 2 weeks ago