kisupov / mpres-blasLinks
Multiple-precision GPU accelerated linear algebra routines (dense and sparse) based on residue number system
☆18Updated 2 years ago
Alternatives and similar repositories for mpres-blas
Users that are interested in mpres-blas are comparing it to the libraries listed below
Sorting:
- cuASR: CUDA Algebra for Semirings☆35Updated 2 years ago
- Next generation library for iterative sparse solvers for ROCm platform☆81Updated last week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆52Updated 2 months ago
- A web interface for the SuiteSparse Matrix Collection, formerly known as the University of Florida Sparse Matrix Collection☆23Updated 2 weeks ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆68Updated 2 years ago
- The CUDA Multiple Precision Arithmetic Library☆46Updated 12 years ago
- Reference implementation of the draft C++ GraphBLAS specification.☆33Updated 3 months ago
- Round matrix elements to lower precision in MATLAB☆37Updated 2 years ago
- CUDA Template Functions☆19Updated 5 months ago
- Next generation LAPACK implementation for ROCm platform☆101Updated this week
- Recursive LAPACK Collection☆42Updated 3 years ago
- Distributed Communication-Optimal LU-factorization Algorithm☆12Updated 3 years ago
- [CF ’20] Verified Instruction-Level Energy Consumption Measurement for NVIDIA GPUs☆15Updated 4 years ago
- Sympiler is a Code Generator for Transforming Sparse Matrix Codes☆43Updated last year
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆70Updated 2 months ago
- Benchmarking OpenBLAS on the Apple M1☆18Updated 4 years ago
- hipFFT is a FFT marshalling library.☆63Updated this week
- Custom-Precision Floating-point numbers.☆36Updated 4 months ago
- Yaksa: High-performance Noncontiguous Data Management☆13Updated 8 months ago
- The Combinatorial BLAS (CombBLAS) is an extensible distributed-memory parallel graph library offering a small but powerful set of linear …☆75Updated this week
- Reusable software components for ROCm developers☆84Updated this week
- Kernel Tuning Toolkit☆59Updated 3 weeks ago
- Linnea is an experimental tool for the automatic generation of optimized code for linear algebra problems.☆69Updated 3 years ago
- best CPU/GPU sparse solver for large sparse matrices☆21Updated 3 years ago
- AMD optimized Sparse Linear Algebra library☆29Updated last week
- fast Fourier transform on GPU in shared memory for AstroAccelerate project☆26Updated 4 years ago
- ExBLAS: fast, accurate, and reproducible BLAS☆13Updated 3 years ago
- ☆17Updated 3 weeks ago
- Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardware☆16Updated 3 years ago
- Data Dependence Analyzer in the Polyhedral Model☆20Updated last year