KAdamek / GPU_Overlap-and-save_convolution
Shared memory overlap-and-save method for NVIDIA GPUs using CUDA
☆16Updated last year
Related projects ⓘ
Alternatives and complementary repositories for GPU_Overlap-and-save_convolution
- fast Fourier transform on GPU in shared memory for AstroAccelerate project☆26Updated 4 years ago
- CUDA-based implementation for linear 1D, 2D and 3D FFT-Shift functions.☆22Updated 9 years ago
- ☆32Updated 3 years ago
- QCD for Intel Xeon Phi and Xeon processors☆14Updated 7 months ago
- A GPU based FX correlator for radio astronomy☆34Updated 6 years ago
- CUDA and OpenMP implementations of C2R/R2C inplace transposition☆45Updated 9 years ago
- FFTX Project☆19Updated last week
- MATLAB Code for Parameters of Floating-Point Arithmetics☆9Updated 2 years ago
- Kernel Tuning Toolkit☆55Updated last week
- The SparseX sparse kernel optimization library☆39Updated 5 years ago
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆35Updated 7 years ago
- Parallel selection on GPUs☆14Updated 3 years ago
- Some C++ codes for computing a 1D and 2D convolution product using the FFT implemented with the GSL or FFTW☆57Updated 11 years ago
- High Availability Shared Pipeline Engine☆15Updated last year
- Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support☆48Updated 3 months ago
- Parallel Tensor Infrastructure (ParTI!)☆28Updated 4 years ago
- Multi-dimensional array programming framework for C++ and multi-GPU CUDA applications☆28Updated 7 years ago
- The fftMPI library performs 2d/3d FFTs in parallel for grids distributed across MPI processes.☆14Updated 2 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆99Updated 7 years ago
- Parallel nonequispaced fast Fourier transforms☆16Updated 6 years ago
- A Massively Parallel FFT Library for CPU/GPU☆54Updated 4 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆65Updated last year
- A Task-based Library for Solving Dense Nonsymmetric Eigenvalue Problems☆21Updated last year
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- Benchmark Suite for Heterogenuous FFT Implementations☆34Updated 10 months ago
- CUDA Tensor Transpose (cuTT) library☆49Updated 7 years ago
- Example code for Intel AVX / AVX2 intrinsics.☆125Updated last year
- NUMA-aware multi-CPU multi-GPU data transfer benchmarks☆21Updated last year