andre-wojtowicz / blas-benchmarks
Timing results for BLAS (Basic Linear Algebra Subprograms) libraries in R
☆31Updated 8 years ago
Alternatives and similar repositories for blas-benchmarks:
Users that are interested in blas-benchmarks are comparing it to the libraries listed below
- sparse matrix pre-processing library☆81Updated 11 months ago
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆70Updated 9 years ago
- Codebase associated with the PyTorch compiler tutorial☆45Updated 5 years ago
- Fast matrix multiplication☆29Updated 3 years ago
- xtensor plugin to read and write images, audio files, numpy (compressed) npz and HDF5☆86Updated last year
- Benchmark of expression templates libraries☆41Updated 4 years ago
- ☆31Updated 3 years ago
- Full-speed Array of Structures access☆169Updated 2 years ago
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆261Updated 3 months ago
- SYCL-ML is a C++ library, implementing classical machine learning algorithms using SYCL.☆66Updated 5 years ago
- This repository is the summary of all of our works for the XLA.☆11Updated 7 years ago
- Generating Families of Practical Fast Matrix Multiplication Algorithms☆12Updated 7 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆68Updated 2 years ago
- portDNN is a library implementing neural network algorithms written using SYCL☆113Updated 11 months ago
- CNNs in Halide☆23Updated 9 years ago
- Range-based for loops to iterate over a range of numbers or values☆35Updated 8 years ago
- Generalized Histograms for CUDA-capable GPUs☆43Updated 9 years ago
- Flexible Library for Efficient Numerical Solutions☆127Updated 3 years ago
- Implementation of the SYCL specification.☆66Updated 10 months ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- Test winograd convolution written in TVM for CUDA and AMDGPU☆41Updated 6 years ago
- npcomp - An aspirational MLIR based numpy compiler☆51Updated 4 years ago
- Kernel Tuning Toolkit☆59Updated last month
- Recursive LAPACK Collection☆42Updated 3 years ago
- ulmBLAS☆105Updated 3 years ago
- CUDA kernel author's tools☆111Updated 3 years ago
- Some CUDA design patterns and a bit of template magic for CUDA☆150Updated last year
- A library to benchmark CUDA code, similar to google benchmark.☆28Updated 4 years ago
- Easy to use benchmarks for linear algebra frameworks☆24Updated 4 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago