KAdamek / SMFFTLinks
fast Fourier transform on GPU in shared memory for AstroAccelerate project
☆27Updated 4 years ago
Alternatives and similar repositories for SMFFT
Users that are interested in SMFFT are comparing it to the libraries listed below
Sorting:
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆196Updated last week
- A GPU-based LZSS compression algorithm, highly tuned for NVIDIA GPGPUs and for streaming data, leveraging the respective strengths of CPU…☆35Updated 9 years ago
- A 128 bit unsigned integer class for CUDA☆46Updated 8 months ago
- BLAS implementation for Intel FPGA☆77Updated 4 years ago
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆262Updated 7 months ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆72Updated 2 years ago
- QCD for Intel Xeon Phi and Xeon processors☆14Updated last year
- A domain-specific language and compiler for image processing☆76Updated 4 years ago
- C++ Header-Only Library for High-Performance Tensor-Vector Multiplication☆22Updated 8 months ago
- SST Macro Element Library☆37Updated 2 months ago
- tools to create performance and roofline plots from measured data☆59Updated 11 years ago
- sparse matrix pre-processing library☆83Updated last year
- Full-speed Array of Structures access☆174Updated 2 years ago
- Multiple 1-stencil implementations using nvidia cuda.☆13Updated 7 years ago
- Next generation library for iterative sparse solvers for ROCm platform☆85Updated this week
- Fast matrix multiplication☆29Updated 4 years ago
- portDNN is a library implementing neural network algorithms written using SYCL☆113Updated last year
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆208Updated 3 months ago
- Tensor Contraction Code Generator☆38Updated 8 years ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆119Updated this week
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆84Updated last year
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆94Updated 3 years ago
- CUDA Tensor Transpose (cuTT) library☆52Updated 8 years ago
- ☆19Updated 3 weeks ago
- Kernel Tuning Toolkit☆64Updated 2 months ago
- Short examples illustrating AVX2 intrinsics for simple tasks.☆96Updated last year
- Library to plot integer sets and maps☆53Updated 8 years ago
- The Surprisingly ParalleL spArse Tensor Toolkit.☆71Updated 3 years ago
- The Combinatorial BLAS (CombBLAS) is an extensible distributed-memory parallel graph library offering a small but powerful set of linear …☆78Updated 3 weeks ago
- RAJA Performance Suite☆121Updated last week