KAdamek / GPU_Overlap-and-save_convolution
Shared memory overlap-and-save method for NVIDIA GPUs using CUDA
☆16Updated 2 years ago
Alternatives and similar repositories for GPU_Overlap-and-save_convolution:
Users that are interested in GPU_Overlap-and-save_convolution are comparing it to the libraries listed below
- fast Fourier transform on GPU in shared memory for AstroAccelerate project☆26Updated 4 years ago
- CUDA-based implementation for linear 1D, 2D and 3D FFT-Shift functions.☆22Updated 9 years ago
- ☆40Updated 3 years ago
- High Availability Shared Pipeline Engine☆15Updated last year
- A GPU based FX correlator for radio astronomy☆35Updated 6 years ago
- CUDA Based De-dispersion library☆11Updated 10 months ago
- CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.☆59Updated 2 years ago
- Benchmark for popular fft libaries - fftw | cufftw | cufft☆16Updated 6 years ago
- OpenMPL (Open Math Performance Library) is an open source math libraries, including BLAS, LAPACK, FFT, VML, and others.☆18Updated last year
- Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support☆53Updated last month
- CAKE Library for constant-bandwidth matrix multiplication on CPUs☆15Updated last year
- OTFFT is a high-speed FFT library using the Stockham's algorithm and AVX.☆22Updated 2 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆68Updated 2 years ago
- choosing FFT library...☆147Updated 2 years ago
- ☆43Updated 4 years ago
- MagmaDNN: a simple deep learning framework in c++☆49Updated 4 years ago
- Kernel Tuning Toolkit☆59Updated last month
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- Fast Fourier Transform implementation, computable on CUDA platform. Seminar project for MI-PRC course at FIT CTU.☆38Updated last year
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆37Updated 7 years ago
- Some C++ codes for computing a 1D and 2D convolution product using the FFT implemented with the GSL or FFTW☆58Updated 11 years ago
- Nonuniform fast Fourier transforms of types 1 and 2, in 1D, 2D, and 3D, on the GPU☆87Updated last year
- AstroAccelerate is a many-core accelerated software package for processing time-domain radio-astronomy data.☆46Updated 3 months ago
- A thread safe simple C++ wrapper for FFTW & MKL☆15Updated 3 years ago
- Bonsai GPU tree code☆70Updated 11 months ago
- Julia ports of the Rodinia benchmark suite for heterogeneous computing infrastructures☆50Updated last year
- Parallel nonequispaced fast Fourier transforms☆16Updated 6 years ago
- NUMA-aware multi-CPU multi-GPU data transfer benchmarks☆23Updated last year
- ☆67Updated 11 years ago