kisupov / mpres-blasLinks
Multiple-precision GPU accelerated linear algebra routines (dense and sparse) based on residue number system
☆18Updated 2 years ago
Alternatives and similar repositories for mpres-blas
Users that are interested in mpres-blas are comparing it to the libraries listed below
Sorting:
- Next generation library for iterative sparse solvers for ROCm platform☆81Updated this week
- Computations in residue number system using CUDA-enabled GPUs☆13Updated 4 years ago
- A web interface for the SuiteSparse Matrix Collection, formerly known as the University of Florida Sparse Matrix Collection☆23Updated 3 weeks ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆55Updated 3 months ago
- cuASR: CUDA Algebra for Semirings☆36Updated 2 years ago
- Kernel Tuning Toolkit☆60Updated last month
- Subset of BLAS routines optimized for NVIDIA GPUs☆69Updated 2 years ago
- BLAS implementation for Intel FPGA☆78Updated 4 years ago
- Reference implementation of the draft C++ GraphBLAS specification.☆33Updated 4 months ago
- AMD optimized Sparse Linear Algebra library☆32Updated last week
- The Combinatorial BLAS (CombBLAS) is an extensible distributed-memory parallel graph library offering a small but powerful set of linear …☆77Updated 3 weeks ago
- Round matrix elements to lower precision in MATLAB☆37Updated 3 years ago
- A unified framework across multiple programming platforms☆41Updated 3 weeks ago
- ☆21Updated 3 years ago
- ☆32Updated 4 years ago
- ROCm SPARSE marshalling library☆67Updated this week
- nvptx-tools: a collection of tools for use with nvptx-none GCC toolchains.☆50Updated 9 months ago
- Sympiler is a Code Generator for Transforming Sparse Matrix Codes☆43Updated last year
- Next generation LAPACK implementation for ROCm platform☆103Updated last week
- A 128 bit unsigned integer class for CUDA☆46Updated 5 months ago
- This is ROCgdb, the ROCm source-level debugger for Linux, based on GDB, the GNU source-level debugger.☆59Updated last week
- A GPU performance prediction toolkit for CUDA programs☆16Updated 6 years ago
- Custom-Precision Floating-point numbers.☆36Updated 5 months ago
- Recursive LAPACK Collection☆42Updated 3 years ago
- MLIR tools and dialect for GraphBLAS☆18Updated 3 years ago
- High-performance Geometric Multigrid☆37Updated 6 years ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆73Updated 3 months ago
- Omni Compiler for C and Fortran programs with XcalableMP and OpenACC directives☆61Updated last year
- Benchmarking OpenBLAS on the Apple M1☆18Updated 4 years ago
- Distributed Communication-Optimal LU-factorization Algorithm☆12Updated 3 years ago