KAdamek / SMFFTLinks
fast Fourier transform on GPU in shared memory for AstroAccelerate project
☆27Updated 5 years ago
Alternatives and similar repositories for SMFFT
Users that are interested in SMFFT are comparing it to the libraries listed below
Sorting:
- Subset of BLAS routines optimized for NVIDIA GPUs☆73Updated 2 years ago
- A 128 bit unsigned integer class for CUDA☆46Updated 10 months ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆198Updated last week
- A GPU-based LZSS compression algorithm, highly tuned for NVIDIA GPGPUs and for streaming data, leveraging the respective strengths of CPU…☆37Updated 9 years ago
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆260Updated 9 months ago
- Full-speed Array of Structures access☆175Updated 2 years ago
- sparse matrix pre-processing library☆83Updated last year
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆95Updated 3 years ago
- Kernel Tuning Toolkit☆65Updated 2 weeks ago
- TTC: A high-performance Compiler for Tensor Transpositions☆21Updated 8 years ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆129Updated last week
- portDNN is a library implementing neural network algorithms written using SYCL☆113Updated last year
- CUDA Tensor Transpose (cuTT) library☆53Updated 8 years ago
- BLAS implementation for Intel FPGA☆77Updated 4 years ago
- C++ Header-Only Library for High-Performance Tensor-Vector Multiplication☆22Updated last week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆124Updated last week
- CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.☆59Updated 3 years ago
- A unified framework across multiple programming platforms☆41Updated 5 months ago
- Tensor Contraction Code Generator☆39Updated 8 years ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆115Updated last week
- CUDA and OpenMP implementations of C2R/R2C inplace transposition☆48Updated 10 years ago
- Multiple 1-stencil implementations using nvidia cuda.☆13Updated 7 years ago
- Shared memory overlap-and-save method for NVIDIA GPUs using CUDA☆17Updated 2 months ago
- Home of ALP/GraphBLAS and ALP/Pregel, featuring shared- and distributed-memory auto-parallelisation of linear algebraic and vertex-centri…☆31Updated this week
- Sparse matrix computation library for GPU☆59Updated 5 years ago
- High-Performance Tensor Transpose library☆205Updated 2 years ago
- The Surprisingly ParalleL spArse Tensor Toolkit.☆73Updated 3 years ago
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆211Updated last week
- CUDA accelerated(X) Multi-Precision library☆92Updated 9 years ago
- MagmaDNN: a simple deep learning framework in c++☆50Updated 5 years ago