KAdamek / GPU_Overlap-and-save_convolutionLinks
Shared memory overlap-and-save method for NVIDIA GPUs using CUDA
☆17Updated 4 months ago
Alternatives and similar repositories for GPU_Overlap-and-save_convolution
Users that are interested in GPU_Overlap-and-save_convolution are comparing it to the libraries listed below
Sorting:
- fast Fourier transform on GPU in shared memory for AstroAccelerate project☆27Updated 5 years ago
- CUDA-based implementation for linear 1D, 2D and 3D FFT-Shift functions.☆22Updated 10 years ago
- Some C++ codes for computing a 1D and 2D convolution product using the FFT implemented with the GSL or FFTW☆60Updated 12 years ago
- ☆43Updated 4 years ago
- Example code for Intel AVX / AVX2 intrinsics.☆144Updated 2 years ago
- Testing different implementation of Atan2☆11Updated 4 years ago
- A GPU based FX correlator for radio astronomy☆40Updated 7 years ago
- Matrix-Vector Multiplication Using Shared and Coalesced Memory Access☆16Updated 12 years ago
- choosing FFT library...☆164Updated 3 years ago
- CLTune: An automatic OpenCL & CUDA kernel tuner☆183Updated 3 years ago
- Short examples illustrating AVX2 intrinsics for simple tasks.☆98Updated last year
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆85Updated last year
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆39Updated 8 years ago
- ☆272Updated this week
- Parallel selection on GPUs☆15Updated 4 years ago
- The SparseX sparse kernel optimization library☆43Updated 7 years ago
- Simple OpenCL Samples that Build with Khronos Headers and Libs☆118Updated last week
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆109Updated 8 years ago
- ☆48Updated 5 years ago
- BLISlab: A Sandbox for Optimizing GEMM☆553Updated 4 years ago
- Source code that accompanies The CUDA Handbook.☆559Updated 3 months ago
- ☆97Updated 8 years ago
- Full-speed Array of Structures access☆176Updated 2 years ago
- Source code for 'Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL' by James Reinders, Ben A…☆283Updated 9 months ago
- A GPU implementation of the Wavelet Transform☆83Updated 5 years ago
- GPU Code optimizer for stencil computations. Refer to our IPDPS'19 paper for more details☆24Updated 6 years ago
- 🎃 GPU load-balancing library for regular and irregular computations.☆64Updated 4 months ago
- C++ implementation of Fast Fourier Aliasing-based Sparse Transform☆19Updated 9 years ago
- Statistics on GPUs☆32Updated 4 months ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆76Updated 2 years ago