umfranzw / cuda-reduction-exampleLinks

This example starts with a simple sum reduction in CUDA, then steps through a series of optimizations we can perform to improve its performance on the GPU. These examples were created alongside a series of lectures (on GPGPU computing) for an undergraduate parallel computing course. You can find the lecture slides in the slides/ directory.

☆13

Alternatives and similar repositories for cuda-reduction-example

Users that are interested in cuda-reduction-example are comparing it to the libraries listed below

Sorting:

OpenGPGPU / opengpgpu
☆68Updated 7 months ago
ehw-fit / tf-approximate
Approximate layers - TensorFlow extension
☆27Updated last month
enyac-group / MaxEVA
MaxEVA: Maximizing the Efficiency of Matrix Multiplication on Versal AI Engine (accepted as full paper at FPT'23)
☆21Updated last year
sharc-lab / FPGA_ECE8893
☆35Updated 2 months ago
cornell-zhang / HiSparse
High-Performance Sparse Linear Algebra on HBM-Equipped FPGAs Using HLS
☆91Updated 8 months ago
spcl / hls_tutorial_examples
Examples shown as part of the tutorial "Productive parallel programming on FPGA with high-level synthesis".
☆199Updated 3 years ago
Zaoldyeckk / High-Level-Synthesis-Flow-on-Zynq-using-Vivado-HLS
This course provides professors with an understanding of high-level synthesis design methodologies necessary to develop digital systems u…
☆53Updated 6 years ago
horizon-research / systolic-array-dataflow-optimizer
A general framework for optimizing DNN dataflow on systolic array
☆36Updated 4 years ago
accel-sim / gpgpu-sim_distribution
GPGPU-Sim provides a detailed simulation model of a contemporary GPU running CUDA and/or OpenCL workloads and now includes an integrated…
☆54Updated this week
sfu-arch / SpGEMM
☆35Updated 4 years ago
zeasa / nvdla-compiler
☆44Updated 5 years ago
ubc-aamodt-group / vulkan-sim
Vulkan-Sim is a GPU architecture simulator for Vulkan ray tracing based on GPGPU-Sim and Mesa.
☆60Updated 4 months ago
KastnerRG / cgra4ml
An Open Workflow to Build Custom SoCs and run Deep Models at the Edge
☆79Updated 2 weeks ago
natu4u / GSOC_TensorCore
TensorCore Vector Processor for Deep Learning - Google Summer of Code Project
☆22Updated 3 years ago
dcblack / ModernSystemC
Example code for Modern SystemC using Modern C++
☆63Updated 2 years ago
lukasc-ch / ExtendedBitPlaneCompression
Provides the code for the paper "EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators" by Luk…
☆19Updated 5 years ago
Xilinx / xup_aie_training
Hands-on experience programming AI Engines using Vitis Unified Software Platform
☆40Updated 10 months ago
hlslibs / ac_math
Algorithmic C Math Library
☆62Updated 2 weeks ago
arc-research-lab / CHARM
CHARM: Composing Heterogeneous Accelerators on Heterogeneous SoC Architecture
☆143Updated this week
spcl / FBLAS
BLAS implementation for Intel FPGA
☆78Updated 4 years ago
sjfeng1999 / gpu-arch-microbenchmark
Dissecting NVIDIA GPU Architecture
☆95Updated 2 years ago
RaulMurillo / Flo-Posit
Posit Arithmetic Cores generated with FloPoCo
☆24Updated 11 months ago
chenxm1986 / nicslu
Parallel sparse direct solver for circuit simulation
☆43Updated 2 years ago
kaiiiz / hls-spmv
High-level synthesis (HLS) implementation of Sparse Matrix Vector Multiplication
☆15Updated 3 years ago
suchandler96 / gem5-NVDLA
☆30Updated 2 months ago
Accelergy-Project / accelergy-timeloop-infrastructure
Linux docker for the DNN accelerator exploration infrastructure composed of Accelergy and Timeloop
☆52Updated last month
chenxm1986 / cktso
Pursuing the best performance of linear solver in circuit simulation
☆37Updated 3 months ago
sunlex0717 / DissectingTensorCores
☆96Updated last year
jha-lab / acceltran
[TCAD'23] AccelTran: A Sparsity-Aware Accelerator for Transformers
☆44Updated last year
linghaosong / Sextans
An FPGA accelerator for general-purpose Sparse-Matrix Dense-Matrix Multiplication (SpMM).
☆79Updated 10 months ago