KAdamek / SMFFT
fast Fourier transform on GPU in shared memory for AstroAccelerate project
☆24Updated 3 years ago
Related projects: ⓘ
- ☆17Updated this week
- Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support☆47Updated last month
- PLASMA is a software package for solving problems in dense linear algebra using OpenMP☆24Updated last month
- Subset of BLAS routines optimized for NVIDIA GPUs☆63Updated last year
- Kernel Tuning Toolkit☆54Updated 3 weeks ago
- A proxy app for the Monte Carlo Transport Code, Mercury. LLNL-CODE-684037☆39Updated 7 months ago
- MagmaDNN: a simple deep learning framework in c++☆45Updated 4 years ago
- A GPU performance prediction toolkit for CUDA programs☆16Updated 5 years ago
- sparse matrix pre-processing library☆81Updated 4 months ago
- QCD for Intel Xeon Phi and Xeon processors☆13Updated 6 months ago
- Distributed Communication-Optimal LU-factorization Algorithm☆12Updated 3 years ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆21Updated last week
- ☆30Updated 3 years ago
- A C++ allocator based on cudaMallocManaged☆23Updated 5 years ago
- Shared memory overlap-and-save method for NVIDIA GPUs using CUDA☆15Updated last year
- Generate simple index ranges in C++ and CUDA C++☆38Updated last year
- Next generation library for iterative sparse solvers for ROCm platform☆74Updated this week
- The fftMPI library performs 2d/3d FFTs in parallel for grids distributed across MPI processes.☆13Updated 2 years ago
- A Massively Parallel FFT Library for CPU/GPU☆54Updated 3 years ago
- The SparseX sparse kernel optimization library☆39Updated 5 years ago
- Comb is a communication performance benchmarking tool.☆23Updated last year
- The Task-Aware MPI (TAMPI) library extends the functionality of standard MPI libraries by providing new mechanisms for improving the inte…☆23Updated 4 months ago
- Experimental Linear Algebra Performance Studies☆12Updated 7 years ago
- ☆21Updated 3 weeks ago
- List all available information about all SYCL devices and platforms☆15Updated 4 years ago
- This tool serves as a test harness for different optimization techniques to improve stencil computations performance in shared and distri…☆20Updated last year
- ☆14Updated 3 years ago
- FFTX Project☆18Updated 4 months ago
- Next generation LAPACK implementation for ROCm platform☆91Updated this week
- Kokkos Remote Spaces implements distributed Kokkos Views and related APIs for distributed parallel programming.☆42Updated 2 weeks ago