CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.
☆39Jul 19, 2017Updated 8 years ago
Alternatives and similar repositories for gpu-sum-reduction
Users that are interested in gpu-sum-reduction are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Massively parallel DBSCAN algorithm implemented in CUDA.☆12Jul 21, 2020Updated 5 years ago
- A collection of awesome algorithms, implemented in CUDA.☆26Feb 6, 2018Updated 8 years ago
- CUDA-accelerated minimum spanning tree algorithm -- data parallel Boruvka's algorithm☆21Apr 19, 2016Updated 10 years ago
- An ANN-LSTM based Model for Learning Individual Customer Behavior in Response to Electricity Prices☆11Mar 27, 2020Updated 6 years ago
- CUDA C implementation of Principal Component Analysis (PCA) through Singular Value Decomposition (SVD) using a highly parallelisable vers…☆30May 10, 2019Updated 6 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- This is a c++ implementation of an LSTM Neural Network parallelized for a GPU using CUDA☆25Oct 29, 2017Updated 8 years ago
- C++ lock-free queue.☆14Jun 24, 2020Updated 5 years ago
- Final Project of CSC417. Implementation of On the Accurate Large Scale Simulation of Ferrofluids☆15Dec 22, 2020Updated 5 years ago
- Massively parallel DBSCAN algorithm implemented in CUDA along with a KD-Tree for searching neighbors.☆13Sep 21, 2020Updated 5 years ago
- A parallel (CUDA) implementation of skiplist☆15Jan 24, 2019Updated 7 years ago
- "A Spatial Target Function for Metropolis Photon Tracing", ACM TOG, Code repository☆20Apr 24, 2023Updated 2 years ago
- DTMF with arduino☆10Jul 15, 2022Updated 3 years ago
- OCCA Python API: JIT Compilation for Multiple Architectures☆11Dec 20, 2019Updated 6 years ago
- This is a LSQR-CUDA implementation written by Lawrence Ayers under the supervision of Stefan Guthe of the GRIS institute at the Technisch…☆13May 11, 2023Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- This repo contains a source code in Python as well CUDA for VRP☆14Jun 16, 2023Updated 2 years ago
- nVidia's CUDA accelerated Spin Transformations of Discrete Surfaces, based on the original code and paper by Keenan Crane, Ulrich Pinkall…☆17Mar 14, 2018Updated 8 years ago
- Lattice Boltzmann D3Q19 simulation for single phase flows☆11Jan 30, 2017Updated 9 years ago
- Record GPU memory accesses of a CUDA program and visualize the access pattern in a browser☆13Nov 17, 2020Updated 5 years ago
- A new QR decomposition algorithm implemented in CUDA☆18Jun 24, 2024Updated last year
- Some CUDA kernels for signal processing (wavelets, convolutions..)☆16Mar 28, 2020Updated 6 years ago
- A C++ allocator based on cudaMallocManaged☆23Nov 19, 2018Updated 7 years ago
- Dijkstra's Algorithm implemented in C/C++ using standard C, OpenMP and CUDA☆13Dec 12, 2015Updated 10 years ago
- 基于 C++ 实现的 Etcd kv 存储系统☆14May 21, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Incomplete-Cholesky preconditioned conjugate gradient algorithm implemented with cuBLAS/cuSPARSE☆12Jun 24, 2022Updated 3 years ago
- Parallel FFT for big integer multiplication. Written in three versions: MPI, OpenMP and CUDA(cufft).☆15Oct 19, 2020Updated 5 years ago
- PhD independent study: implementing basic and important soft body deformation papers☆26Feb 20, 2018Updated 8 years ago
- Singularity recipes for OpenFOAM☆12Jan 2, 2022Updated 4 years ago
- Experience Lab is a set of utilities that assist in creating instances of Microsoft Azure Data Manager for Energy, performing data loads,…☆18Apr 10, 2026Updated last week
- 电网技术论文所用代码☆19Sep 29, 2018Updated 7 years ago
- Easier, quicker command-line CUDA profiling☆54Sep 17, 2024Updated last year
- Parallelizing Strassen’s matrix multiplication using OpenMP, MPI and CUDA.☆16Nov 27, 2021Updated 4 years ago
- TensorRT half precision inference routine on a API-based TensorRT model☆12Jul 3, 2018Updated 7 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- CUDA implementation of the K-Means clustering algorithm☆13Sep 1, 2024Updated last year
- Official implementation of ICML'24 paper "LQER: Low-Rank Quantization Error Reconstruction for LLMs"☆19Jul 11, 2024Updated last year
- Final Project for Parallel Computing at CMU (15-618/15-418)☆10May 13, 2016Updated 9 years ago
- A quick-and-dirty attempt to get scoped tasks in Rust.☆14Jun 4, 2023Updated 2 years ago
- Parallel SpMV using CSR representation, built in CUDA☆14Jun 27, 2020Updated 5 years ago
- Matrix Accelerator Generator for GeMM Operations based on SIGMA Architecture in CHISEL HDL☆15Mar 21, 2024Updated 2 years ago
- A Django project template for microservices☆13Mar 18, 2020Updated 6 years ago