umfranzw / cuda-reduction-exampleLinks
This example starts with a simple sum reduction in CUDA, then steps through a series of optimizations we can perform to improve its performance on the GPU. These examples were created alongside a series of lectures (on GPGPU computing) for an undergraduate parallel computing course. You can find the lecture slides in the slides/ directory.
☆12Updated 5 years ago
Alternatives and similar repositories for cuda-reduction-example
Users that are interested in cuda-reduction-example are comparing it to the libraries listed below
Sorting:
- ☆197Updated 3 weeks ago
- PyTorch model to RTL flow for low latency inference☆130Updated last year
- FlexGripPlus: an open-source GPU model for reliability evaluation and micro architectural simulation☆112Updated 2 years ago
- A tool to deploy Deep Neural Networks on PULP-based SoC's☆90Updated 3 months ago
- Examples shown as part of the tutorial "Productive parallel programming on FPGA with high-level synthesis".☆204Updated 4 years ago
- Vulkan-Sim is a GPU architecture simulator for Vulkan ray tracing based on GPGPU-Sim and Mesa.☆75Updated 9 months ago
- ☆71Updated last year
- A heterogeneous accelerator-centric compute cluster☆30Updated last week
- Timeloop performs modeling, mapping and code-generation for tensor algebra workloads on various accelerator architectures.☆430Updated last month
- High-level synthesis (HLS) implementation of Sparse Matrix Vector Multiplication☆18Updated 3 years ago
- ☆46Updated 6 years ago
- Matrix Operation Library for FPGA https://xilinx.github.io/gemx/☆63Updated 6 years ago
- CHARM: Composing Heterogeneous Accelerators on Heterogeneous SoC Architecture☆159Updated this week
- AutoSA: Polyhedral-Based Systolic Array Compiler☆230Updated 2 years ago
- RTL implementation of Flex-DPE.☆115Updated 5 years ago
- A framework for fast exploration of the depth-first scheduling space for DNN accelerators☆41Updated 2 years ago
- ☆47Updated 2 years ago
- Systolic array implementations for Cholesky, LU, and QR decomposition☆46Updated last year
- Exercises for exploring the Fibertree, Timeloop and Accelergy tools☆107Updated 7 months ago
- Benchmark framework of compute-in-memory based accelerators for deep neural network (inference engine focused)☆74Updated last year
- CSV spreadsheets and other material for AI accelerator survey papers☆182Updated last year
- An FPGA accelerator for general-purpose Sparse-Matrix Dense-Matrix Multiplication (SpMM).☆91Updated last year
- Eyeriss chip simulator☆38Updated 5 years ago
- Performance Prediction Toolkit for GPUs☆39Updated 3 years ago
- FlexASR: A Reconfigurable Hardware Accelerator for Attention-based Seq-to-Seq Networks☆49Updated 8 months ago
- Linux docker for the DNN accelerator exploration infrastructure composed of Accelergy and Timeloop☆59Updated last month
- OpenCGRA is an open-source framework for modeling, testing, and evaluating CGRAs.☆161Updated 2 years ago
- ☆118Updated last week
- A scalable High-Level Synthesis framework on MLIR☆282Updated last year
- This course provides professors with an understanding of high-level synthesis design methodologies necessary to develop digital systems u…☆54Updated 7 years ago