KAdamek / GPU_Overlap-and-save_convolution
Shared memory overlap-and-save method for NVIDIA GPUs using CUDA
☆16Updated 2 years ago
Alternatives and similar repositories for GPU_Overlap-and-save_convolution:
Users that are interested in GPU_Overlap-and-save_convolution are comparing it to the libraries listed below
- fast Fourier transform on GPU in shared memory for AstroAccelerate project☆26Updated 4 years ago
- ☆37Updated 3 years ago
- CUDA-based implementation for linear 1D, 2D and 3D FFT-Shift functions.☆22Updated 9 years ago
- CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.☆59Updated 2 years ago
- High Availability Shared Pipeline Engine☆15Updated last year
- Kernel Tuning Toolkit☆58Updated 2 weeks ago
- Simple OpenCL Samples that Build with Khronos Headers and Libs☆97Updated last week
- Parallel nonequispaced fast Fourier transforms☆16Updated 6 years ago
- A Massively Parallel FFT Library for CPU/GPU☆55Updated 4 years ago
- ☆34Updated 4 years ago
- The SparseX sparse kernel optimization library☆39Updated 6 years ago
- Julia ports of the Rodinia benchmark suite for heterogeneous computing infrastructures☆49Updated last year
- List all available information about all SYCL devices and platforms☆15Updated 4 years ago
- Parallel selection on GPUs☆15Updated 3 years ago
- Benchmarking OpenBLAS on the Apple M1☆18Updated 4 years ago
- sparse matrix pre-processing library☆81Updated 9 months ago
- A C++ allocator based on cudaMallocManaged☆23Updated 6 years ago
- Some C++ codes for computing a 1D and 2D convolution product using the FFT implemented with the GSL or FFTW☆58Updated 11 years ago
- Multi-dimensional array programming framework for C++ and multi-GPU CUDA applications☆28Updated 8 years ago
- A GPU based FX correlator for radio astronomy☆35Updated 6 years ago
- A hierarchical matrix C/C++ library☆23Updated last week
- This tool serves as a test harness for different optimization techniques to improve stencil computations performance in shared and distri…☆21Updated 2 years ago
- Error-Free Transformations as building blocks for compensated algorithms☆14Updated last year
- High-Performance Reproducible BLAS using posit arithmetic☆12Updated 2 years ago
- My notes on various HPC papers.☆21Updated 2 years ago
- Recursive LAPACK Collection☆42Updated 3 years ago
- Distributed Performance-portable Stencil Compuitation☆9Updated last year
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- A web interface for the SuiteSparse Matrix Collection, formerly known as the University of Florida Sparse Matrix Collection☆22Updated 2 months ago
- BLAS++ is a C++ wrapper around CPU and GPU BLAS (basic linear algebra subroutines), developed as part of the SLATE project.☆74Updated this week