roguh / cuda-fft
Yet another FFT implementation in CUDA. Includes benchmarks using simple data for comparing different implementations.
☆11Updated 3 years ago
Alternatives and similar repositories for cuda-fft:
Users that are interested in cuda-fft are comparing it to the libraries listed below
- Fast Fourier Transform Acceleration Algorithm. (Accelerated by CUDA)☆11Updated 6 years ago
- A demo of Fast Fourier transform in CUDA implementing by cooleytukey and stockham method☆8Updated 7 years ago
- Case studies constitute a modern interdisciplinary and valuable teaching practice which plays a critical and fundamental role in the deve…☆13Updated 6 years ago
- Shared memory overlap-and-save method for NVIDIA GPUs using CUDA☆16Updated 2 years ago
- Fast Fourier Transform implementation, computable on CUDA platform. Seminar project for MI-PRC course at FIT CTU.☆37Updated last year
- fast Fourier transform on GPU in shared memory for AstroAccelerate project☆26Updated 4 years ago
- Example code for Intel AVX / AVX2 intrinsics.☆134Updated last year
- Intel AVX-512简介☆43Updated last year
- Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.☆127Updated 3 years ago
- ☆37Updated 3 years ago
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆210Updated 2 months ago
- I implemented a parallel algorithm for matrix inversion based on Gauss-Jordan elimination.☆45Updated 9 years ago
- QCD for Intel Xeon Phi and Xeon processors☆14Updated 11 months ago
- 作为对《Heterogeneous Computing with OpenCL 2.0》英文版的中文翻译。☆131Updated 4 years ago
- 14 basic topics for VEGA64 performance optmization☆52Updated 3 years ago
- ☆11Updated 5 years ago
- ☆420Updated 9 years ago
- CUDA-based implementation for linear 1D, 2D and 3D FFT-Shift functions.☆22Updated 9 years ago
- My notes on various HPC papers.☆21Updated 2 years ago
- IMPACT GPU Algorithms Teaching Labs☆56Updated last year
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆36Updated 7 years ago
- CUDA Tensor Transpose (cuTT) library☆51Updated 7 years ago
- ☆228Updated this week
- Xiao's CUDA Optimization Guide [Active Adding New Contents]☆264Updated 2 years ago
- Software to support people learning OpenMP with our book ... The OpenMP Common Core: Making OpenMP Simple Again☆81Updated last year
- CUDA by practice☆121Updated 5 years ago
- Next generation FFT implementation for ROCm☆188Updated this week
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆124Updated 4 years ago
- oneAPI Math Library (oneMath)☆645Updated 3 weeks ago
- Step-by-step optimization of CUDA SGEMM☆285Updated 2 years ago