roguh / cuda-fftLinks
Yet another FFT implementation in CUDA. Includes benchmarks using simple data for comparing different implementations.
☆13Updated 3 years ago
Alternatives and similar repositories for cuda-fft
Users that are interested in cuda-fft are comparing it to the libraries listed below
Sorting:
- A demo of Fast Fourier transform in CUDA implementing by cooleytukey and stockham method☆8Updated 7 years ago
- Case studies constitute a modern interdisciplinary and valuable teaching practice which plays a critical and fundamental role in the deve…☆13Updated 6 years ago
- Fast Fourier Transform Acceleration Algorithm. (Accelerated by CUDA)☆11Updated 6 years ago
- ☆113Updated last year
- fast Fourier transform on GPU in shared memory for AstroAccelerate project☆26Updated 4 years ago
- Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.☆148Updated 3 years ago
- Serial and parallel implementations of matrix multiplication☆41Updated 4 years ago
- Dissecting NVIDIA GPU Architecture☆97Updated 2 years ago
- A simple high performance CUDA GEMM implementation.☆382Updated last year
- This is an implementation of sgemm_kernel on L1d cache.☆228Updated last year
- ☆67Updated 11 years ago
- Sample code from the book "Professional CUDA C Programming"☆35Updated 2 years ago
- CUDA C++ syntax support & snippets for VSCode☆20Updated 4 years ago
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆275Updated 2 weeks ago
- Example code for Intel AVX / AVX2 intrinsics.☆138Updated last year
- Examples from Programming in Parallel with CUDA☆153Updated 2 years ago
- Shared memory overlap-and-save method for NVIDIA GPUs using CUDA☆16Updated 2 years ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆137Updated 4 years ago
- 高性能计算☆20Updated 5 years ago
- Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.☆357Updated 5 months ago
- Yinghan's Code Sample☆335Updated 2 years ago
- 14 basic topics for VEGA64 performance optmization☆56Updated 4 years ago
- ☆276Updated 4 years ago
- Personal Notes for Learning HPC & Parallel Computation [Active Adding New Content]☆67Updated 2 years ago
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆32Updated 4 years ago
- ☆40Updated 4 years ago
- CUDA Matrix Multiplication Optimization☆196Updated 11 months ago
- 个人翻译《Data Parallel C++》☆75Updated 3 years ago
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …☆183Updated 5 months ago
- Intel AVX-512简介☆49Updated last year