kisupov / mpres-blas
Multiple-precision GPU accelerated linear algebra routines (dense and sparse) based on residue number system
☆17Updated 2 years ago
Alternatives and similar repositories for mpres-blas:
Users that are interested in mpres-blas are comparing it to the libraries listed below
- Next generation library for iterative sparse solvers for ROCm platform☆79Updated this week
- cuASR: CUDA Algebra for Semirings☆35Updated 2 years ago
- Kernel Tuning Toolkit☆56Updated 2 months ago
- The Combinatorial BLAS (CombBLAS) is an extensible distributed-memory parallel graph library offering a small but powerful set of linear …☆70Updated last month
- Subset of BLAS routines optimized for NVIDIA GPUs☆67Updated last year
- A web interface for the SuiteSparse Matrix Collection, formerly known as the University of Florida Sparse Matrix Collection☆22Updated last month
- Recursive LAPACK Collection☆42Updated 2 years ago
- Julia ports of the Rodinia benchmark suite for heterogeneous computing infrastructures☆49Updated last year
- Reference implementation of the draft C++ GraphBLAS specification.☆30Updated 11 months ago
- best CPU/GPU sparse solver for large sparse matrices☆20Updated 3 years ago
- MagmaDNN: a simple deep learning framework in c++☆49Updated 4 years ago
- BLAS implementation for Intel FPGA☆76Updated 4 years ago
- This tool serves as a test harness for different optimization techniques to improve stencil computations performance in shared and distri…☆20Updated 2 years ago
- Sympiler is a Code Generator for Transforming Sparse Matrix Codes☆42Updated last year
- High-performance Geometric Multigrid☆33Updated 5 years ago
- A unified framework across multiple programming platforms☆35Updated 7 months ago
- Round matrix elements to lower precision in MATLAB☆36Updated 2 years ago
- High-Performance Reproducible BLAS using posit arithmetic☆12Updated 2 years ago
- A GPU performance prediction toolkit for CUDA programs☆16Updated 5 years ago
- Next generation LAPACK implementation for ROCm platform☆98Updated this week
- 🎃 GPU load-balancing library for regular and irregular computations.☆59Updated 7 months ago
- TTC: A high-performance Compiler for Tensor Transpositions☆20Updated 7 years ago
- Distributed Communication-Optimal LU-factorization Algorithm☆12Updated 3 years ago
- A 128 bit unsigned integer class for CUDA☆43Updated 3 weeks ago
- Omni Compiler for C and Fortran programs with XcalableMP and OpenACC directives☆61Updated last year
- A GPU algorithm for sparse matrix-matrix multiplication☆67Updated 4 years ago
- ExBLAS: fast, accurate, and reproducible BLAS☆13Updated 3 years ago
- Benchmarking OpenBLAS on the Apple M1☆18Updated 4 years ago
- [CF ’20] Verified Instruction-Level Energy Consumption Measurement for NVIDIA GPUs☆15Updated 4 years ago
- sparse matrix pre-processing library☆81Updated 8 months ago