KAdamek / GPU_Overlap-and-save_convolutionLinks
Shared memory overlap-and-save method for NVIDIA GPUs using CUDA
☆17Updated 3 weeks ago
Alternatives and similar repositories for GPU_Overlap-and-save_convolution
Users that are interested in GPU_Overlap-and-save_convolution are comparing it to the libraries listed below
Sorting:
- CUDA-based implementation for linear 1D, 2D and 3D FFT-Shift functions.☆22Updated 10 years ago
- fast Fourier transform on GPU in shared memory for AstroAccelerate project☆27Updated 4 years ago
- A GPU based FX correlator for radio astronomy☆36Updated 7 years ago
- Some C++ codes for computing a 1D and 2D convolution product using the FFT implemented with the GSL or FFTW☆59Updated 12 years ago
- ☆41Updated 4 years ago
- Example code for Intel AVX / AVX2 intrinsics.☆140Updated 2 years ago
- CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.☆59Updated 3 years ago
- CUDA Tensor Transpose (cuTT) library☆52Updated 8 years ago
- AstroAccelerate is a many-core accelerated software package for processing time-domain radio-astronomy data.☆50Updated last month
- BLISlab: A Sandbox for Optimizing GEMM☆536Updated 4 years ago
- choosing FFT library...☆156Updated 3 years ago
- The SparseX sparse kernel optimization library☆41Updated 6 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆106Updated 8 years ago
- CUDA Based De-dispersion library☆11Updated last year
- Online CUDA Occupancy Calculator☆80Updated 3 years ago
- Kernel Tuning Toolkit☆64Updated 2 months ago
- ulmBLAS☆108Updated 3 months ago
- Source code that accompanies The CUDA Handbook.☆539Updated 7 months ago
- Simple OpenCL examples for exploiting GPU computing☆223Updated last year
- A GPU implementation of the Wavelet Transform☆79Updated 4 years ago
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆39Updated 8 years ago
- Parallel Tensor Infrastructure (ParTI!)☆30Updated 5 years ago
- Advanced Vector Extensions (AVX) basic tutorial☆37Updated 4 years ago
- Testing different implementation of Atan2☆11Updated 3 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆72Updated 2 years ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆86Updated 5 months ago
- Short examples illustrating AVX2 intrinsics for simple tasks.☆96Updated last year
- Vector Math Library☆81Updated last week
- QCD for Intel Xeon Phi and Xeon processors☆14Updated last year
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆195Updated this week