crosetto / cupqLinks

a CUDA implementation of a priority queue

☆84

Alternatives and similar repositories for cupq

Users that are interested in cupq are comparing it to the libraries listed below

Sorting:

eyalroz / cuda-kat
CUDA kernel author's tools
☆115Updated 3 years ago
owensgroup / SlabHash
A warp-oriented dynamic hash table for GPUs
☆76Updated last year
ashvardanian / ParallelReductionsBenchmark
Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!
☆114Updated 5 months ago
bryancatanzaro / trove
Full-speed Array of Structures access
☆176Updated 2 years ago
sleeepyjack / warpcore
A Library for fast Hash Tables on GPUs
☆130Updated 2 months ago
milakov / int_fastdiv
Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.
☆73Updated 10 years ago
owensgroup / BGHT
BGHT: High-performance static GPU hash tables.
☆71Updated 5 months ago
Heteroflow / Heteroflow
Concurrent CPU-GPU Programming using Task Models
☆105Updated 6 years ago
owensgroup / MVGpuBTree
GPU B-Tree with support for versioning (snapshots).
☆51Updated last year
pdziepak / ranges-gpu
Experimental ranges for CUDA
☆25Updated 6 years ago
berkeley-container-library / bcl
The Berkeley Container Library
☆126Updated last week
gevtushenko / cuda_benchmark
A library to benchmark CUDA code, similar to google benchmark.
☆30Updated 4 years ago
ogiroux / freestanding
☆71Updated 5 years ago
Kobzol / hardware-effects-gpu
Demonstration of various hardware effects on CUDA GPUs.
☆390Updated 2 years ago
edanor / umesimd
UME::SIMD A library for explicit simd vectorization.
☆91Updated 7 years ago
alpaka-group / mallocMC
mallocMC: Memory Allocator for Many Core Architectures
☆58Updated 3 weeks ago
agency-library / agency
Execution primitives for C++
☆154Updated 5 years ago
harrism / ranger
Generate simple index ranges in C++ and CUDA C++
☆39Updated 2 years ago
eyalroz / gpu-kernel-runner
Runs a single CUDA/OpenCL kernel, taking its source from a file and arguments from the command-line
☆24Updated 3 weeks ago
llnl / fpzip
Lossless compressor of multidimensional floating-point arrays
☆123Updated 5 years ago
dian-lun-lin / taro
Task graph-based asynchronous programming system using C++ coroutine
☆96Updated last year
AMDResearch / DAGEE
Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as ta…
☆48Updated 4 years ago
owensgroup / GpuBTree
Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019
☆57Updated 3 years ago
codeplaysoftware / portDNN
portDNN is a library implementing neural network algorithms written using SYCL
☆113Updated last year
ProjectPhysX / PTXprofiler
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
☆56Updated 9 months ago
celerity / celerity-runtime
High-level C++ for Accelerator Clusters
☆154Updated 3 weeks ago
NVIDIA / jitify
A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).
☆567Updated 3 months ago
NERSC / timemory
Modular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python. The C++ template…
☆366Updated last year
codeplaysoftware / portBLAS
Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.
☆260Updated 11 months ago
mgopshtein / cudacpp
C++ convenience classes to be used with CUDA code, for both the host and the kerlel parts.
☆55Updated 7 years ago