Geolm / simd_bitonic
Bitonic sort using simd (avx/neon) instructions
☆14Updated 2 years ago
Alternatives and similar repositories for simd_bitonic:
Users that are interested in simd_bitonic are comparing it to the libraries listed below
- GPU B-Tree with support for versioning (snapshots).☆47Updated 4 months ago
- C++ interfaces for RDMA access☆66Updated last month
- Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019☆56Updated 2 years ago
- AVX512F and AVX2 versions of quick sort☆105Updated 7 years ago
- Library for lock-free locks☆77Updated last year
- Code and results for our paper "Analyzing Vectorized Hash Tables Across CPU Architectures" @ VLDB '23.☆24Updated last year
- This is a mirror of the official libpfm4 git repository, https://sourceforge.net/p/perfmon2/libpfm4/ci/master/tree/ with some local branc…☆56Updated 4 months ago
- ☆52Updated 9 months ago
- Fast C header-only library for popcnt, pospopcnt, and set algebraic operations☆45Updated 5 years ago
- Evaluating different memory managers for dynamic GPU memory☆25Updated 4 years ago
- Packed Memory Array☆17Updated 10 years ago
- User-space Page Management☆107Updated 6 months ago
- Universal Presentation: A Header-only C++ Library to Cout STL containers and more☆19Updated last year
- Sample program for article "SIMD-ized searching in unique constant dictionary" (http://0x80.pl/articles/simd-search.html)☆52Updated 7 years ago
- ☆20Updated 2 years ago
- a CUDA implementation of a priority queue☆83Updated 4 years ago
- A Scalable, Portable, and Memory-Efficient Lock-Free FIFO Queue (DISC '19)☆55Updated last year
- A low level, low latency library, which can be used to accelerate network messages using shared memory and RDMA☆74Updated 4 years ago
- A tool for examining GPU scheduling behavior.☆71Updated 6 months ago
- A GPU-Accelerated In-Memory Key-Value Store (AWS-focused fork)☆28Updated 7 years ago
- ☆43Updated 4 years ago
- GPU-Accelerated Lossless Data Compressors Survey☆113Updated 4 years ago
- Unit benchmarks of CUDA event APIs.☆17Updated 10 months ago
- Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.☆117Updated last year
- Artifact for PPoPP 2018 paper "Making Pull-Based Graph Processing Performant"☆23Updated 4 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated last year
- Persistent Memory Test Suite☆13Updated 4 years ago
- Provides a set of benchmarks that can be used to measure the memory bandwidth performance of CPU's☆84Updated 10 months ago
- Efficiently Searching In-Memory Sorted Arrays:Revenge of the Interpolation Search?☆28Updated 3 years ago
- ☆14Updated 5 years ago