KAdamek / SMFFTLinks
fast Fourier transform on GPU in shared memory for AstroAccelerate project
☆27Updated 4 years ago
Alternatives and similar repositories for SMFFT
Users that are interested in SMFFT are comparing it to the libraries listed below
Sorting:
- Subset of BLAS routines optimized for NVIDIA GPUs☆73Updated 2 years ago
- Kernel Tuning Toolkit☆65Updated last week
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆95Updated 3 years ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆199Updated last week
- BLAS implementation for Intel FPGA☆77Updated 4 years ago
- A 128 bit unsigned integer class for CUDA☆46Updated 9 months ago
- The Surprisingly ParalleL spArse Tensor Toolkit.☆72Updated 3 years ago
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆261Updated 9 months ago
- Full-speed Array of Structures access☆173Updated 2 years ago
- Fast matrix multiplication☆31Updated 4 years ago
- sparse matrix pre-processing library☆83Updated last year
- A domain-specific language and compiler for image processing☆76Updated 4 years ago
- The Combinatorial BLAS (CombBLAS) is an extensible distributed-memory parallel graph library offering a small but powerful set of linear …☆79Updated 2 months ago
- Multiple 1-stencil implementations using nvidia cuda.☆13Updated 7 years ago
- TTC: A high-performance Compiler for Tensor Transpositions☆21Updated 8 years ago
- Concurrent CPU-GPU Programming using Task Models☆103Updated 5 years ago
- ☆31Updated last month
- portDNN is a library implementing neural network algorithms written using SYCL☆113Updated last year
- CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.☆59Updated 3 years ago
- The SparseX sparse kernel optimization library☆42Updated 6 years ago
- CUDA accelerated(X) Multi-Precision library☆92Updated 9 years ago
- A C++ allocator based on cudaMallocManaged☆23Updated 6 years ago
- bhSPARSE: A Sparse BLAS Library☆16Updated 9 years ago
- C++ Header-Only Library for High-Performance Tensor-Vector Multiplication☆22Updated 10 months ago
- tools to create performance and roofline plots from measured data☆59Updated 11 years ago
- A GPU-based LZSS compression algorithm, highly tuned for NVIDIA GPGPUs and for streaming data, leveraging the respective strengths of CPU…☆36Updated 9 years ago
- Shared memory overlap-and-save method for NVIDIA GPUs using CUDA☆17Updated last month
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆130Updated last week
- An Architecture-level Fault Injection Tool for GPU Application Resilience Evaluations☆18Updated 5 years ago
- a software library containing Sparse functions written in OpenCL☆175Updated 5 years ago