mark-poscablo / gpu-prefix-sumLinks

CUDA implementation of exclusive prefix sum via Blelloch's algorithm

☆28

Alternatives and similar repositories for gpu-prefix-sum

Users that are interested in gpu-prefix-sum are comparing it to the libraries listed below

Sorting:

mattdean1 / cuda
An implementation of parallel exclusive scan in CUDA
☆62Updated 7 years ago
milakov / int_fastdiv
Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.
☆71Updated 9 years ago
knotman90 / cuStreamComp
Efficient CUDA Stream Compaction Library
☆34Updated 2 years ago
horizon-research / rtnn
☆67Updated 2 years ago
bryancatanzaro / trove
Full-speed Array of Structures access
☆172Updated 2 years ago
mark-poscablo / gpu-radix-sort
CUDA implementation of parallel radix sort using Blelloch scan
☆64Updated last year
owensgroup / BGHT
BGHT: High-performance static GPU hash tables.
☆70Updated last month
ndd314 / cuda_examples
☆68Updated 11 years ago
PatWie / cuda-design-patterns
Some CUDA design patterns and a bit of template magic for CUDA
☆156Updated 2 years ago
sleeepyjack / warpcore
A Library for fast Hash Tables on GPUs
☆125Updated 3 years ago
owensgroup / SlabHash
A warp-oriented dynamic hash table for GPUs
☆74Updated last year
NVlabs / cub
THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.
☆84Updated last year
kuiwuchn / 3x3_SVD_CUDA
Fast CUDA 3x3 SVD
☆75Updated 6 years ago
apc-llc / whippletree
Whippletree, a novel approach to scheduling dynamic, irregular workloads on the GPU
☆22Updated 9 years ago
cudpp / cudpp
CUDA Data Parallel Primitives Library
☆432Updated 6 years ago
Ahdhn / CUDATemplate
Template for starting CUDA/C++ project using CMake with Github Action for CI
☆31Updated last month
eyalroz / cuda-kat
CUDA kernel author's tools
☆113Updated 3 years ago
dumerrill / merge-spmv
☆94Updated 8 years ago
codeplaysoftware / portDNN
portDNN is a library implementing neural network algorithms written using SYCL
☆113Updated last year
gunrock / loops
🎃 GPU load-balancing library for regular and irregular computations.
☆62Updated last year
ptheywood / cuda-cmake-github-actions
☆59Updated 11 months ago
cusplibrary / cusplibrary
CUSP : A C++ Templated Sparse Matrix Library
☆415Updated this week
GPUPeople / spECK
Efficient SpGEMM on GPU using CUDA and CSR
☆57Updated 2 years ago
NVIDIA / rtx_compute_samples
RTX compute samples
☆70Updated 2 years ago
tpn / cuda-samples
☆61Updated 2 years ago
harrism / ranger
Generate simple index ranges in C++ and CUDA C++
☆39Updated 2 years ago
GPUPeople / ACSpGEMM
Repository holding the code base to AC-SpGEMM : "Adaptive Sparse Matrix-Matrix Multiplication on the GPU"
☆29Updated 5 years ago
zchee / cuda-sample
CUDA official sample codes
☆372Updated 9 years ago
ProjectPhysX / PTXprofiler
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
☆55Updated 4 months ago
aekul / aether
☆52Updated 6 years ago