Serial and parallel implementations of matrix multiplication
☆46Feb 19, 2021Updated 5 years ago
Alternatives and similar repositories for mmul
Users that are interested in mmul are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Simple and efficient memory pool is implemented with C++11.☆10Jun 2, 2022Updated 4 years ago
- A simple trace-based cache simulator☆16Jan 3, 2025Updated last year
- A simulation of the Tomasulo algorithm, a hardware algorithm for out-of-order scheduling and execution of computer instructions, written …☆17Apr 22, 2017Updated 9 years ago
- Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardware☆15Mar 1, 2022Updated 4 years ago
- Code from the "CUDA Crash Course" YouTube series by CoffeeBeforeArch☆963Jul 19, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Implementation of the CCSDS TM and TC standards for the AcubeSAT nanosatellite☆17Dec 22, 2025Updated 6 months ago
- JavaScript Tomasulo algorithm simulator☆18Nov 14, 2025Updated 7 months ago
- A DDS core written in VHDL.☆11Jan 5, 2019Updated 7 years ago
- bluesky clone built with Flutter using the bluesky package running on AT protocol☆10Sep 9, 2023Updated 2 years ago
- How to use node-local MPI rank IDs to manually map MPI ranks to GPUs☆14Apr 22, 2020Updated 6 years ago
- Automatic Conversion of Source Code for C to CUDA C☆23Apr 1, 2014Updated 12 years ago
- portFFT is a library implementing Fast Fourier Transforms using SYCL☆19Mar 1, 2025Updated last year
- misc analysis script for hqx (hq2x hq3x hq4x) algorithms☆19Jun 21, 2014Updated 12 years ago
- ☆10Sep 27, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- SGEMM and DGEMM subroutines using AVX512F instructions.☆15May 22, 2022Updated 4 years ago
- Matlab mex wrappers to cuSPARSE (NVIDIA)☆11Dec 10, 2025Updated 6 months ago
- ☆12Jul 2, 2023Updated 3 years ago
- A simple blogging web application built with the Leptos framework☆14Sep 18, 2024Updated last year
- Molecular integrals over Gaussian basis functions using sympy.☆16Oct 2, 2024Updated last year
- Ukrainian ELECTRA model☆12Mar 11, 2023Updated 3 years ago
- 📖 Twitter- React TS, Apollo Federation, Async GraphQL, Actix Web framework, Postgres SQL, Docker, Docker Compose, Redis, Apache Kafka , …☆15Aug 15, 2023Updated 2 years ago
- CUDA C simple application for Nvidia's GPU☆11Jun 7, 2022Updated 4 years ago
- 稀疏矩阵-向量乘的并行优化算法(OpenMP,AVX)☆11Jul 7, 2021Updated 4 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆74Feb 16, 2023Updated 3 years ago
- ☆10Aug 18, 2025Updated 10 months ago
- An implementation of SGEMV with performance comparable to cuBLAS.☆12May 21, 2021Updated 5 years ago
- The PyTorch implementation of paper "KERMIT: Knowledge Graph Completion of Enhanced Relation Modeling with Inverse Transformation"☆16Jul 4, 2025Updated last year
- ☆10Jul 4, 2022Updated 4 years ago
- Source code of the IPDPS '21 paper: "TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs" by Yuyao Niu, Zhengyang…☆13Aug 12, 2022Updated 3 years ago
- A simplified cache simulator for instructional purposes☆15Dec 30, 2020Updated 5 years ago
- An experiment to compare the performance of Rust and Cython☆16Aug 7, 2021Updated 4 years ago
- A tutorial/example of the Python C-API and integration with CUDA kernels.☆14Jul 7, 2019Updated 6 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆14May 21, 2024Updated 2 years ago
- formation Deep Learning Optimisé pour Jean Zay☆19Oct 20, 2025Updated 8 months ago
- ☆17Jun 13, 2026Updated 3 weeks ago
- TinyRP is a simple lightweight HTTP reverse proxy made in golang☆12Apr 17, 2026Updated 2 months ago
- Visual bag of words for fast image matching☆25Apr 27, 2023Updated 3 years ago
- Using the QOI image format to save sequences of images☆10Feb 12, 2022Updated 4 years ago
- Software-based rasterization library☆11Jan 30, 2023Updated 3 years ago