KAdamek / GPU_Overlap-and-save_convolutionLinks
Shared memory overlap-and-save method for NVIDIA GPUs using CUDA
☆17Updated 2 months ago
Alternatives and similar repositories for GPU_Overlap-and-save_convolution
Users that are interested in GPU_Overlap-and-save_convolution are comparing it to the libraries listed below
Sorting:
- fast Fourier transform on GPU in shared memory for AstroAccelerate project☆27Updated 5 years ago
- Code appendix to an OpenCL matrix-multiplication tutorial☆178Updated 8 years ago
- Simple OpenCL Samples that Build with Khronos Headers and Libs☆116Updated this week
- ☆41Updated 4 years ago
- CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.☆59Updated 3 years ago
- choosing FFT library...☆160Updated 3 years ago
- Some C++ codes for computing a 1D and 2D convolution product using the FFT implemented with the GSL or FFTW☆59Updated 12 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆108Updated 8 years ago
- Examples for HIP☆211Updated 11 months ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆73Updated 2 years ago
- ☆267Updated last week
- Learn OpenCL step by step.☆135Updated 3 years ago
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆95Updated 3 years ago
- BLISlab: A Sandbox for Optimizing GEMM☆546Updated 4 years ago
- A tool which profiles OpenCL devices to find their peak capacities☆474Updated 4 months ago
- An implementation of SGEMV with performance comparable to cuBLAS.☆11Updated 4 years ago
- Simple OpenCL examples for exploiting GPU computing☆223Updated last year
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆39Updated 8 years ago
- Online CUDA Occupancy Calculator☆80Updated 4 years ago
- ☆48Updated 5 years ago
- Intel® GPU Compute Samples☆109Updated last month
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆260Updated 9 months ago
- Example code for Intel AVX / AVX2 intrinsics.☆142Updated 2 years ago
- portDNN is a library implementing neural network algorithms written using SYCL☆113Updated last year
- A Collection of Articles and other OpenCL Papers☆59Updated 6 years ago
- ☆71Updated 11 years ago
- a software library containing Sparse functions written in OpenCL☆175Updated 5 years ago
- Automatically exported from code.google.com/p/math-neon☆40Updated 10 years ago
- CLTune: An automatic OpenCL & CUDA kernel tuner☆182Updated 2 years ago
- Source code for 'Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL' by James Reinders, Ben A…☆281Updated 7 months ago