KAdamek / GPU_Overlap-and-save_convolutionLinks
Shared memory overlap-and-save method for NVIDIA GPUs using CUDA
☆16Updated 2 years ago
Alternatives and similar repositories for GPU_Overlap-and-save_convolution
Users that are interested in GPU_Overlap-and-save_convolution are comparing it to the libraries listed below
Sorting:
- fast Fourier transform on GPU in shared memory for AstroAccelerate project☆26Updated 4 years ago
- CUDA-based implementation for linear 1D, 2D and 3D FFT-Shift functions.☆22Updated 9 years ago
- ☆40Updated 4 years ago
- A GPU based FX correlator for radio astronomy☆35Updated 7 years ago
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆37Updated 7 years ago
- Benchmarking OpenBLAS on the Apple M1☆18Updated 4 years ago
- CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.☆59Updated 2 years ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated 2 years ago
- A Massively Parallel FFT Library for CPU/GPU☆56Updated 4 years ago
- Bonsai GPU tree code☆71Updated last year
- Parallel selection on GPUs☆16Updated 4 years ago
- Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support☆54Updated 3 months ago
- Fast Fourier Transform implementation, computable on CUDA platform. Seminar project for MI-PRC course at FIT CTU.☆38Updated 2 years ago
- High Availability Shared Pipeline Engine☆15Updated last year
- CAKE Library for constant-bandwidth matrix multiplication on CPUs☆15Updated last year
- ☆67Updated 11 years ago
- Kernel Tuning Toolkit☆60Updated last month
- ☆44Updated 4 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆105Updated 7 years ago
- QCD for Intel Xeon Phi and Xeon processors☆14Updated last year
- Examples for HIP☆208Updated 6 months ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆69Updated 2 years ago
- resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI☆21Updated last year
- Some C++ codes for computing a 1D and 2D convolution product using the FFT implemented with the GSL or FFTW☆58Updated 12 years ago
- Parallel nonequispaced fast Fourier transforms☆16Updated 7 years ago
- A simple gravitational N-body simulation in less than 100 lines of C code, with CUDA optimizations.☆101Updated 11 years ago
- Next generation library for iterative sparse solvers for ROCm platform☆81Updated this week
- CUDA Based De-dispersion library☆11Updated last year
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆93Updated 3 years ago
- portFFT is a library implementing Fast Fourier Transforms using SYCL☆17Updated 3 months ago