Serial and parallel implementations of matrix multiplication
☆46Feb 19, 2021Updated 5 years ago
Alternatives and similar repositories for mmul
Users that are interested in mmul are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Simple and efficient memory pool is implemented with C++11.☆10Jun 2, 2022Updated 3 years ago
- A simple trace-based cache simulator☆16Jan 3, 2025Updated last year
- Code from the "CUDA Crash Course" YouTube series by CoffeeBeforeArch☆941Jul 19, 2023Updated 2 years ago
- CK workflow, portable packages and other artifacts for the ReQuEST-ASPLOS'18 submission:☆12Jan 16, 2019Updated 7 years ago
- Implementation of the CCSDS TM and TC standards for the AcubeSAT nanosatellite☆18Dec 22, 2025Updated 3 months ago
- Example code from Parallel Programming in C with MPI and OpenMP☆11Feb 24, 2021Updated 5 years ago
- ☆12Apr 16, 2024Updated last year
- ☆17Sep 15, 2021Updated 4 years ago
- ☆16Oct 23, 2022Updated 3 years ago
- Examples from the "C++ From Scratch" Series☆103Feb 6, 2023Updated 3 years ago
- This repository contains an implementation for design patterns detection. In this task, feature engineering and ensemble learning are app…☆10Jul 30, 2022Updated 3 years ago
- How to use node-local MPI rank IDs to manually map MPI ranks to GPUs☆14Apr 22, 2020Updated 5 years ago
- High Availability Shared Pipeline Engine☆17Sep 15, 2023Updated 2 years ago
- Automatic Conversion of Source Code for C to CUDA C☆23Apr 1, 2014Updated 11 years ago
- portFFT is a library implementing Fast Fourier Transforms using SYCL☆19Mar 1, 2025Updated last year
- 'Build a Full-Stack Twitter Clone with Rust' course code and notes☆14Aug 6, 2023Updated 2 years ago
- Matlab mex wrappers to cuSPARSE (NVIDIA)☆11Dec 10, 2025Updated 3 months ago
- Misc simulation cases☆14Dec 18, 2017Updated 8 years ago
- ☆11Jul 2, 2023Updated 2 years ago
- Deploying an ML Model in a Task Queue☆11Jul 9, 2024Updated last year
- Ukrainian ELECTRA model☆12Mar 11, 2023Updated 3 years ago
- CUDA C simple application for Nvidia's GPU☆11Jun 7, 2022Updated 3 years ago
- 稀疏矩阵-向量乘的并行优化算法(OpenMP,AVX)☆11Jul 7, 2021Updated 4 years ago
- A molecular integral code generator☆12Feb 7, 2016Updated 10 years ago
- Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs☆13Apr 3, 2025Updated 11 months ago
- ☆10Aug 18, 2025Updated 7 months ago
- Memory-efficient optimum einsum using opt_einsum planning and PyTorch kernels.☆16Apr 24, 2023Updated 2 years ago
- Automatically exported from code.google.com/p/ftke☆18Dec 3, 2015Updated 10 years ago
- Source code of the IPDPS '21 paper: "TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs" by Yuyao Niu, Zhengyang…☆12Aug 12, 2022Updated 3 years ago
- An example to implement PBC SCF☆14Jul 10, 2018Updated 7 years ago
- A tutorial/example of the Python C-API and integration with CUDA kernels.☆14Jul 7, 2019Updated 6 years ago
- ☆14May 21, 2024Updated last year
- formation Deep Learning Optimisé pour Jean Zay☆19Oct 20, 2025Updated 5 months ago
- Wishbone to ARM AMBA 4 AXI☆16May 25, 2019Updated 6 years ago
- Using the QOI image format to save sequences of images☆10Feb 12, 2022Updated 4 years ago
- Software-based rasterization library☆11Jan 30, 2023Updated 3 years ago
- Mirror of http://gitlab.hpcrl.cse.ohio-state.edu/chong/ppopp19_ae, refactoring for understanding☆16Oct 20, 2021Updated 4 years ago
- DHCP server that talks GRPC☆15Jul 8, 2017Updated 8 years ago
- JIT-compiled GPU kernels for quantum chemistry☆31Jan 30, 2026Updated last month