mattdean1 / cudaLinks

An implementation of parallel exclusive scan in CUDA

☆62

Alternatives and similar repositories for cuda

Users that are interested in cuda are comparing it to the libraries listed below

Sorting:

mark-poscablo / gpu-prefix-sum
CUDA implementation of exclusive prefix sum via Blelloch's algorithm
☆28Updated 8 years ago
sleeepyjack / warpcore
A Library for fast Hash Tables on GPUs
☆125Updated 3 years ago
mark-poscablo / gpu-radix-sort
CUDA implementation of parallel radix sort using Blelloch scan
☆64Updated last year
owensgroup / BGHT
BGHT: High-performance static GPU hash tables.
☆70Updated last month
CUDA-Tutorial / CodeSamples
Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"
☆91Updated last year
mark-poscablo / gpu-sum-reduction
CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.
☆37Updated 8 years ago
owensgroup / SlabHash
A warp-oriented dynamic hash table for GPUs
☆74Updated last year
horizon-research / rtnn
☆67Updated 2 years ago
bryancatanzaro / trove
Full-speed Array of Structures access
☆172Updated 2 years ago
PatWie / cuda-design-patterns
Some CUDA design patterns and a bit of template magic for CUDA
☆156Updated 2 years ago
knotman90 / cuStreamComp
Efficient CUDA Stream Compaction Library
☆34Updated 2 years ago
eyalroz / cuda-kat
CUDA kernel author's tools
☆113Updated 3 years ago
hpcgarage / cuASR
cuASR: CUDA Algebra for Semirings
☆36Updated 2 years ago
gunrock / loops
🎃 GPU load-balancing library for regular and irregular computations.
☆62Updated last year
harrism / ranger
Generate simple index ranges in C++ and CUDA C++
☆39Updated 2 years ago
ingowald / cudaKDTree
☆267Updated last month
gevtushenko / matrix_format_performance
☆29Updated 5 years ago
nosferalatu / SimpleGPUHashTable
A simple GPU hash table implemented in CUDA using lock free techniques
☆396Updated last year
PhDP / cuda-cmake-gtest-gbench-starter
A cross-platform CUDA/C++17 starter project with google test and google benchmark support.
☆39Updated 4 months ago
ptheywood / cuda-cmake-github-actions
☆59Updated 11 months ago
milakov / int_fastdiv
Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.
☆71Updated 9 years ago
dumerrill / merge-spmv
☆94Updated 8 years ago
robertmaynard / code-samples
Source code examples from the Parallel Forall Blog
☆96Updated 6 years ago
cusplibrary / cusplibrary
CUSP : A C++ Templated Sparse Matrix Library
☆415Updated this week
ashvardanian / ParallelReductionsBenchmark
Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!
☆103Updated 2 weeks ago
llohse / libnpy
C++ library for reading and writing of numpy's .npy files
☆414Updated 10 months ago
InteractiveComputerGraphics / cuNSearch
A C++/CUDA library to efficiently compute neighborhood information on the GPU for 3D point clouds within a fixed radius.
☆106Updated last year
brian-kelley / CUDA-QR
A new QR decomposition algorithm implemented in CUDA
☆17Updated last year
GPUPeople / spECK
Efficient SpGEMM on GPU using CUDA and CSR
☆57Updated 2 years ago
tpn / cuda-samples
☆61Updated 2 years ago