CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.
☆39Jul 19, 2017Updated 8 years ago
Alternatives and similar repositories for gpu-sum-reduction
Users that are interested in gpu-sum-reduction are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Massively parallel DBSCAN algorithm implemented in CUDA.☆12Jul 21, 2020Updated 5 years ago
- CUDA GPU implementation of GMRES iterative Solver☆10Apr 16, 2012Updated 13 years ago
- CUDA-accelerated minimum spanning tree algorithm -- data parallel Boruvka's algorithm☆21Apr 19, 2016Updated 9 years ago
- An ANN-LSTM based Model for Learning Individual Customer Behavior in Response to Electricity Prices☆11Mar 27, 2020Updated 6 years ago
- 3D Deformable Solid Simulator using the Finite Element Method☆12Mar 18, 2018Updated 8 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Implementation of parallel Breadth First Algorithm for graph traversal using CUDA and C++ language.☆34Dec 12, 2019Updated 6 years ago
- Reinforcement learning project using deep Q-learning to control the operations of an electrical microgrid☆11Jan 3, 2023Updated 3 years ago
- This is a c++ implementation of an LSTM Neural Network parallelized for a GPU using CUDA☆25Oct 29, 2017Updated 8 years ago
- C++ lock-free queue.☆14Jun 24, 2020Updated 5 years ago
- Final Project of CSC417. Implementation of On the Accurate Large Scale Simulation of Ferrofluids☆15Dec 22, 2020Updated 5 years ago
- A parallel (CUDA) implementation of skiplist☆15Jan 24, 2019Updated 7 years ago
- "A Spatial Target Function for Metropolis Photon Tracing", ACM TOG, Code repository☆20Apr 24, 2023Updated 2 years ago
- OCCA Python API: JIT Compilation for Multiple Architectures☆11Dec 20, 2019Updated 6 years ago
- nVidia's CUDA accelerated Spin Transformations of Discrete Surfaces, based on the original code and paper by Keenan Crane, Ulrich Pinkall…☆17Mar 14, 2018Updated 8 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Record GPU memory accesses of a CUDA program and visualize the access pattern in a browser☆13Nov 17, 2020Updated 5 years ago
- A new QR decomposition algorithm implemented in CUDA☆18Jun 24, 2024Updated last year
- Some CUDA kernels for signal processing (wavelets, convolutions..)☆16Mar 28, 2020Updated 6 years ago
- A C++ allocator based on cudaMallocManaged☆23Nov 19, 2018Updated 7 years ago
- Hex encode & decode a string, right from your terminal.☆10Jan 5, 2023Updated 3 years ago
- Incomplete-Cholesky preconditioned conjugate gradient algorithm implemented with cuBLAS/cuSPARSE☆12Jun 24, 2022Updated 3 years ago
- Parallel FFT for big integer multiplication. Written in three versions: MPI, OpenMP and CUDA(cufft).☆15Oct 19, 2020Updated 5 years ago
- IEEE 754-based c++ half-precision floating point library forked from http://half.sourceforge.net☆25Sep 23, 2021Updated 4 years ago
- PhD independent study: implementing basic and important soft body deformation papers☆26Feb 20, 2018Updated 8 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Easier, quicker command-line CUDA profiling☆53Sep 17, 2024Updated last year
- 电网技术论文所用代码☆18Sep 29, 2018Updated 7 years ago
- Yet another tensor library☆23Mar 29, 2017Updated 9 years ago
- CUDA implementation of the K-Means clustering algorithm☆13Sep 1, 2024Updated last year
- GPU-Accelerated multigrid solver for Poisson's equation in 2D☆29Apr 25, 2021Updated 4 years ago
- Official implementation of ICML'24 paper "LQER: Low-Rank Quantization Error Reconstruction for LLMs"☆19Jul 11, 2024Updated last year
- Final Project for Parallel Computing at CMU (15-618/15-418)☆10May 13, 2016Updated 9 years ago
- Computing FLOPs with Intel Software Development Emulator (Intel SDE)☆26Oct 22, 2023Updated 2 years ago
- Parallel SpMV using CSR representation, built in CUDA☆14Jun 27, 2020Updated 5 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Matrix Accelerator Generator for GeMM Operations based on SIGMA Architecture in CHISEL HDL☆15Mar 21, 2024Updated 2 years ago
- Generate simple index ranges in C++ and CUDA C++☆39Jun 14, 2023Updated 2 years ago
- ☆15Feb 15, 2018Updated 8 years ago
- Base container for developing C++ and Fortran HPC applications☆18Jun 14, 2022Updated 3 years ago
- A genetic algorithm to find optimal solutions for TSP (Travelling Salesman Problem) using the CUDA Architecture (GPU)☆18Aug 21, 2015Updated 10 years ago
- HCC Sample Applications☆13Jan 3, 2017Updated 9 years ago
- Handwritten Digit Recognition Using Neural Network by Python☆10May 10, 2018Updated 7 years ago