danielchalef / openblas-benchmark-m1
Benchmarking OpenBLAS on the Apple M1
☆17Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for openblas-benchmark-m1
- NPBench - A Benchmarking Suite for High-Performance NumPy☆73Updated this week
- A GPU performance prediction toolkit for CUDA programs☆16Updated 5 years ago
- CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.☆59Updated 2 years ago
- cuASR: CUDA Algebra for Semirings☆34Updated 2 years ago
- Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardware☆15Updated 2 years ago
- Error-Free Transformations as building blocks for compensated algorithms☆14Updated last year
- A Sound and Complete Verification Tool for Warp-Specialized GPU Kernels☆18Updated 9 years ago
- ☆65Updated this week
- A task benchmark☆39Updated 3 months ago
- HiCMA: Hierarchical Computations on Manycore Architectures☆28Updated last year
- Distributed Communication-Optimal LU-factorization Algorithm☆12Updated 3 years ago
- TAU Performance System Public Mirror (Updated every night at midnight, USA Pacific Time)☆39Updated this week
- Reference implementation of Deep Neural Network primitives using LIBXSMM's Tensor Processing Primitives (TPP)☆11Updated 3 months ago
- Tensor Contraction Code Generator☆36Updated 7 years ago
- Julia ports of the Rodinia benchmark suite for heterogeneous computing infrastructures☆47Updated last year
- MagmaDNN: a simple deep learning framework in c++☆45Updated 4 years ago
- ☆36Updated this week
- Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support☆48Updated 3 months ago
- MLIR tools and dialect for GraphBLAS☆16Updated 2 years ago
- Recursive LAPACK Collection☆42Updated 2 years ago
- TTC: A high-performance Compiler for Tensor Transpositions☆20Updated 7 years ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆22Updated last month
- Exploring using stdpar and Cython☆32Updated 3 years ago
- ☆14Updated last month
- Multiple-precision GPU accelerated linear algebra routines (dense and sparse) based on residue number system☆17Updated last year
- Distributed-memory, arbitrary-precision, dense and sparse-direct linear algebra, conic optimization, and lattice reduction☆65Updated last month
- Round matrix elements to lower precision in MATLAB☆35Updated 2 years ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆65Updated last year
- ☆15Updated 3 years ago
- ☆29Updated 4 years ago