KAdamek / GPU_Overlap-and-save_convolution
Shared memory overlap-and-save method for NVIDIA GPUs using CUDA
☆16Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for GPU_Overlap-and-save_convolution
- fast Fourier transform on GPU in shared memory for AstroAccelerate project☆26Updated 4 years ago
- ☆36Updated 3 years ago
- CUDA-based implementation for linear 1D, 2D and 3D FFT-Shift functions.☆22Updated 9 years ago
- A GPU based FX correlator for radio astronomy☆34Updated 6 years ago
- Kernel Tuning Toolkit☆55Updated 3 weeks ago
- CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.☆59Updated 2 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆43Updated 10 months ago
- A GPU-based LZSS compression algorithm, highly tuned for NVIDIA GPGPUs and for streaming data, leveraging the respective strengths of CPU…☆35Updated 8 years ago
- Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support☆48Updated 3 months ago
- MagmaDNN: a simple deep learning framework in c++☆45Updated 4 years ago
- Some C++ codes for computing a 1D and 2D convolution product using the FFT implemented with the GSL or FFTW☆57Updated 11 years ago
- Example code for Intel AVX / AVX2 intrinsics.☆128Updated last year
- ☆30Updated 4 years ago
- AstroAccelerate is a many-core accelerated software package for processing time-domain radio-astronomy data.☆42Updated last month
- FFTX Project☆19Updated 3 weeks ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆99Updated 7 years ago
- High Availability Shared Pipeline Engine☆15Updated last year
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- The SparseX sparse kernel optimization library☆39Updated 5 years ago
- Next generation FFT implementation for ROCm☆176Updated this week
- CAKE Library for constant-bandwidth matrix multiplication on CPUs☆14Updated 7 months ago
- choosing FFT library...☆138Updated 2 years ago
- My notes on various HPC papers.☆21Updated last year
- Simple OpenCL Samples that Build with Khronos Headers and Libs☆88Updated last week
- ☆218Updated last week
- Parallel selection on GPUs☆14Updated 3 years ago
- Benchmark Suite for Heterogenuous FFT Implementations☆34Updated 10 months ago
- Fork of https://gitlab.mpcdf.mpg.de/mtr/pocketfft to simplify external contributions☆75Updated 3 months ago
- MPI accelerator-integrated communication extensions☆32Updated last year