mperlet / matrix_multiplicationLinks
Parallel Matrix Multiplication Using OpenMP, Phtreads, and MPI
☆59Updated 3 years ago
Alternatives and similar repositories for matrix_multiplication
Users that are interested in matrix_multiplication are comparing it to the libraries listed below
Sorting:
- MPI Tutorial Exercises☆46Updated 12 years ago
- matrix multiplication in CUDA☆125Updated 2 years ago
- "Hardware, Software, and Compilers! Oh My!" tutorial files☆16Updated 6 years ago
- This repository stores all of the OLCF vector addition tutorials☆25Updated 11 years ago
- IMPACT GPU Algorithms Teaching Labs☆59Updated 2 years ago
- Neural Network implementation in C++ running for MNIST database.☆56Updated 10 years ago
- Chai☆47Updated 2 months ago
- ☆34Updated 3 years ago
- Sparse Matrix-Vector Multiplication implementations in C☆22Updated 3 years ago
- Graph500 reference implementations☆181Updated 3 years ago
- Sparse Matrix-Matrix Multiplication Benchmark on Intel Xeon and Xeon Phi (KNC, KNL) from blog post:☆12Updated 9 years ago
- A Comprehensive Benchmark Suite for Graph Computing☆70Updated 6 years ago
- Learn OpenMP examples step by step☆101Updated last year
- Medusa: Building GPU-based Parallel Sparse Graph Applications with Sequential C/C++ Code☆63Updated 5 years ago
- openmp examples☆150Updated 6 years ago
- Modified version of PyTorch able to work with changes to GPGPU-Sim☆57Updated 3 years ago
- Benchmarks of Deep Neural Networks☆39Updated 4 years ago
- SST Architectural Simulation Components and Libraries☆113Updated this week
- CMU 15210 Parallel and Sequential Data Structures and Algorithms☆21Updated 10 years ago
- A high performance implementation of kmeans algorithm with cuda☆18Updated 11 years ago
- Darwin: A co-processor for long read alignment☆16Updated 6 years ago
- tools to create performance and roofline plots from measured data☆60Updated 11 years ago
- CUDA Sparse-Matrix Vector Multiplication, using Sliced Coordinate format☆22Updated 7 years ago
- NeuroVectorizer is a framework that uses deep reinforcement learning (RL) to predict optimal vectorization compiler pragmas for for loops…☆98Updated 3 years ago
- A GPU cache model for research purposes☆28Updated 12 years ago
- Rodinia benchmark☆200Updated 2 years ago
- LonestarGPU: Irregular algorithms parallelized for GPUs☆38Updated 6 years ago
- ☆49Updated 5 years ago
- HPC Challenge Benchmark☆68Updated 4 months ago
- A Dataflow library for graph analytics acceleration☆14Updated 10 years ago