KAdamek / GPU_Overlap-and-save_convolutionLinks
Shared memory overlap-and-save method for NVIDIA GPUs using CUDA
☆17Updated last month
Alternatives and similar repositories for GPU_Overlap-and-save_convolution
Users that are interested in GPU_Overlap-and-save_convolution are comparing it to the libraries listed below
Sorting:
- fast Fourier transform on GPU in shared memory for AstroAccelerate project☆27Updated 4 years ago
- choosing FFT library...☆158Updated 3 years ago
- Kernel Tuning Toolkit☆65Updated last week
- Some C++ codes for computing a 1D and 2D convolution product using the FFT implemented with the GSL or FFTW☆59Updated 12 years ago
- a software library containing FFT functions written in OpenCL☆641Updated 3 years ago
- portDNN is a library implementing neural network algorithms written using SYCL☆113Updated last year
- CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.☆59Updated 3 years ago
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆39Updated 8 years ago
- CLTune: An automatic OpenCL & CUDA kernel tuner☆182Updated 2 years ago
- CUDA-based implementation for linear 1D, 2D and 3D FFT-Shift functions.☆22Updated 10 years ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆199Updated last week
- Code appendix to an OpenCL matrix-multiplication tutorial☆178Updated 8 years ago
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆95Updated 3 years ago
- Online CUDA Occupancy Calculator☆80Updated 4 years ago
- Learn OpenCL step by step.☆136Updated 3 years ago
- Source code for 'Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL' by James Reinders, Ben A…☆278Updated 6 months ago
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆261Updated 9 months ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆84Updated last year
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆107Updated 8 years ago
- ☆40Updated 4 years ago
- The SHOC Benchmark Suite☆257Updated last week
- Example code for Intel AVX / AVX2 intrinsics.☆141Updated 2 years ago
- Source code that accompanies The CUDA Handbook.☆548Updated last week
- BLISlab: A Sandbox for Optimizing GEMM☆540Updated 4 years ago
- Intel® GPU Compute Samples☆109Updated last month
- ☆48Updated 5 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆73Updated 2 years ago
- Parallel selection on GPUs☆15Updated 4 years ago
- a software library containing Sparse functions written in OpenCL☆175Updated 5 years ago
- parallel algorithm based on cuda☆60Updated 7 years ago