danielchalef / openblas-benchmark-m1
Benchmarking OpenBLAS on the Apple M1
☆18Updated 4 years ago
Alternatives and similar repositories for openblas-benchmark-m1:
Users that are interested in openblas-benchmark-m1 are comparing it to the libraries listed below
- cuASR: CUDA Algebra for Semirings☆35Updated 2 years ago
- Round matrix elements to lower precision in MATLAB☆36Updated 2 years ago
- A pseudo random number generator library written against the SYCL API.☆12Updated 5 years ago
- Julia ports of the Rodinia benchmark suite for heterogeneous computing infrastructures☆49Updated last year
- Worked example of the process from Python source to CUDA kernel execution with Numba☆37Updated 5 months ago
- ☆14Updated 2 years ago
- A GPU performance prediction toolkit for CUDA programs☆16Updated 5 years ago
- ☆51Updated 6 months ago
- Distributed Communication-Optimal LU-factorization Algorithm☆12Updated 3 years ago
- A unified framework across multiple programming platforms☆36Updated 7 months ago
- Next generation library for iterative sparse solvers for ROCm platform☆78Updated this week
- CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.☆59Updated 2 years ago
- ☆20Updated 3 years ago
- MLIR tools and dialect for GraphBLAS☆18Updated 2 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆68Updated last year
- NPBench - A Benchmarking Suite for High-Performance NumPy☆77Updated this week
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆30Updated 2 months ago
- Tensor Contraction Code Generator☆36Updated 7 years ago
- The Combinatorial BLAS (CombBLAS) is an extensible distributed-memory parallel graph library offering a small but powerful set of linear …☆70Updated this week
- Error-Free Transformations as building blocks for compensated algorithms☆14Updated last year
- Multiple-precision GPU accelerated linear algebra routines (dense and sparse) based on residue number system☆17Updated 2 years ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆55Updated this week
- A Sound and Complete Verification Tool for Warp-Specialized GPU Kernels☆18Updated 9 years ago
- Flexible and performant GEMM kernels in Julia☆80Updated 3 months ago
- BLAS implementation for Intel FPGA☆76Updated 4 years ago
- Linnea is an experimental tool for the automatic generation of optimized code for linear algebra problems.☆68Updated 3 years ago
- ☆13Updated 5 years ago
- Strassen's Algorithm for Tensor Contraction☆12Updated 7 years ago
- Automatic High-Order Optimization for Tensors☆23Updated last year
- Distributed-memory, arbitrary-precision, dense and sparse-direct linear algebra, conic optimization, and lattice reduction☆65Updated 4 months ago