talhasaruhan / cpp-matmulLinks

Fast, multithreaded, AVX/FMA matrix multiplication kernel in C++ 17

☆18

Alternatives and similar repositories for cpp-matmul

Users that are interested in cpp-matmul are comparing it to the libraries listed below

Sorting:

harrism / ranger
Generate simple index ranges in C++ and CUDA C++
☆39Updated 2 years ago
gunrock / loops
🎃 GPU load-balancing library for regular and irregular computations.
☆62Updated last year
MagmaDNN / magmadnn
MagmaDNN: a simple deep learning framework in c++
☆50Updated 4 years ago
eyalroz / cuda-kat
CUDA kernel author's tools
☆111Updated 3 years ago
jeffhammond / dpcpp-tutorial
Intel Data Parallel C++ (and SYCL 2020) Tutorial.
☆93Updated 3 years ago
eth-cscs / COSMA
Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm
☆207Updated 2 months ago
ecrc / kblas-gpu
Subset of BLAS routines optimized for NVIDIA GPUs
☆71Updated 2 years ago
sympiler / sympiler
Sympiler is a Code Generator for Transforming Sparse Matrix Codes
☆43Updated 2 years ago
gonzalobg / cpp_hpc_tutorial
C++ HPC Tutorial materials
☆54Updated last year
eth-cscs / DLA-Future
DLA-Future
☆76Updated this week
pkestene / MS-HPC-AI-GPU
resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI
☆21Updated last year
LLNL / camp
Compiler agnostic metaprogramming library providing concepts, type operations and tuples for C++ and cuda
☆87Updated 3 weeks ago
SparseBLAS / spblas-reference
☆30Updated last week
fynv / ThrustRTC
CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.
☆59Updated 2 years ago
mark-poscablo / gpu-sum-reduction
CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.
☆37Updated 8 years ago
Ahdhn / CUDATemplate
Template for starting CUDA/C++ project using CMake with Github Action for CI
☆31Updated 3 weeks ago
gevtushenko / matrix_format_performance
☆29Updated 5 years ago
bryancatanzaro / trove
Full-speed Array of Structures access
☆171Updated 2 years ago
acdemiralp / mpi
Header-only C++20 wrapper for MPI 4.0.
☆47Updated last year
owensgroup / BGHT
BGHT: High-performance static GPU hash tables.
☆70Updated 2 weeks ago
LLNL / RAJAPerf
RAJA Performance Suite
☆118Updated last week
ROCm / rocSPARSE
Next generation SPARSE implementation for ROCm platform
☆129Updated last week
GraphBLAS / rgri
Reference implementation of the draft C++ GraphBLAS specification.
☆33Updated 5 months ago
codeplaysoftware / SYCL-For-CUDA-Examples
Examples for using SYCL on CUDA
☆62Updated 2 weeks ago
CoffeeBeforeArch / mmul
Serial and parallel implementations of matrix multiplication
☆42Updated 4 years ago
xmartlabs / cuda-calculator
Online CUDA Occupancy Calculator
☆79Updated 3 years ago
ROCm / rocThrust
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆119Updated this week
LLNL / LULESH
Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH)
☆109Updated 2 years ago
ROCm / hipCUB
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆84Updated last week
PatWie / cuda-design-patterns
Some CUDA design patterns and a bit of template magic for CUDA
☆155Updated 2 years ago