kisupov / mpres-blas
Multiple-precision GPU accelerated linear algebra routines (dense and sparse) based on residue number system
☆17Updated 2 years ago
Alternatives and similar repositories for mpres-blas:
Users that are interested in mpres-blas are comparing it to the libraries listed below
- Next generation library for iterative sparse solvers for ROCm platform☆78Updated this week
- Julia ports of the Rodinia benchmark suite for heterogeneous computing infrastructures☆49Updated last year
- cuASR: CUDA Algebra for Semirings☆35Updated 2 years ago
- MagmaDNN: a simple deep learning framework in c++☆49Updated 4 years ago
- Distributed Communication-Optimal LU-factorization Algorithm☆12Updated 3 years ago
- Recursive LAPACK Collection☆42Updated 3 years ago
- BLAS implementation for Intel FPGA☆76Updated 4 years ago
- Round matrix elements to lower precision in MATLAB☆36Updated 2 years ago
- High-performance Geometric Multigrid☆33Updated 5 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆68Updated last year
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated last year
- This repository mirrors the principal Gitlab repository of the Chebyshev Accelerated Subspace iteration Eigensolver. If you want to contr…☆16Updated last week
- ROCm SPARSE marshalling library☆67Updated this week
- a tester for BLAS libraries including OpenBLAS and Intel MKL. This project is based on ATLAS BLAS Tester☆34Updated 2 years ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆55Updated 2 weeks ago
- A unified framework across multiple programming platforms☆36Updated 8 months ago
- nvptx-tools: a collection of tools for use with nvptx-none GCC toolchains.☆49Updated 6 months ago
- sparse matrix pre-processing library☆81Updated 9 months ago
- Reference implementation of the draft C++ GraphBLAS specification.☆30Updated 2 weeks ago
- Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceler…☆28Updated 8 months ago
- High-Performance Reproducible BLAS using posit arithmetic☆12Updated 2 years ago
- CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.☆59Updated 2 years ago
- Next generation LAPACK implementation for ROCm platform☆98Updated last week
- Base container for developing C++ and Fortran HPC applications☆18Updated 2 years ago
- Reusable software components for ROCm developers☆82Updated this week
- hipFFT is a FFT marshalling library.☆58Updated last week
- The Combinatorial BLAS (CombBLAS) is an extensible distributed-memory parallel graph library offering a small but powerful set of linear …☆71Updated 2 weeks ago
- A BUDE virtual-screening benchmark, in many programming models☆26Updated 4 months ago
- Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardware☆16Updated 3 years ago
- Benchmarking OpenBLAS on the Apple M1☆18Updated 4 years ago