Geolm / simd_bitonicLinks
Bitonic sort using simd (avx/neon) instructions
☆14Updated 3 years ago
Alternatives and similar repositories for simd_bitonic
Users that are interested in simd_bitonic are comparing it to the libraries listed below
Sorting:
- GPU B-Tree with support for versioning (snapshots).☆48Updated 8 months ago
- Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019☆56Updated 3 years ago
- AVX512F and AVX2 versions of quick sort☆104Updated 7 years ago
- C++ interfaces for RDMA access☆77Updated last week
- A lock-free priority queue implementation☆34Updated 7 years ago
- A User-Transparent Block Cache Enabling High-Performance Out-of-Core Processing with In-Memory Programs☆73Updated 2 years ago
- ☆54Updated last year
- Library for lock-free locks☆82Updated 2 years ago
- Code and results for our paper "Analyzing Vectorized Hash Tables Across CPU Architectures" @ VLDB '23.☆25Updated last year
- GPU-Accelerated Lossless Data Compressors Survey☆117Updated 4 years ago
- Source code for the FAST '23 paper “MadFS: Per-File Virtualization for Userspace Persistent Memory Filesystems”☆41Updated 2 years ago
- ☆20Updated 2 years ago
- Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.☆122Updated last year
- A Scalable, Portable, and Memory-Efficient Lock-Free FIFO Queue (DISC '19)☆60Updated last year
- This is a mirror of the official libpfm4 git repository, https://sourceforge.net/p/perfmon2/libpfm4/ci/master/tree/ with some local branc…☆65Updated 8 months ago
- Artifact for PPoPP 2018 paper "Making Pull-Based Graph Processing Performant"☆23Updated 5 years ago
- Fast C header-only library for popcnt, pospopcnt, and set algebraic operations☆45Updated 5 years ago
- Parallel cuckoo hashing on GPUs with CUDA☆11Updated 5 years ago
- A low level, low latency library, which can be used to accelerate network messages using shared memory and RDMA☆76Updated 4 years ago
- Pointer-chasing memory benchmark (forked from Doug Pase's code).☆59Updated 11 years ago
- a CUDA implementation of a priority queue☆84Updated 4 years ago
- JSONPath Streaming with Bit-Parallel Fast-Forwarding☆25Updated 8 months ago
- An open-source BzTree implementation☆92Updated 3 years ago
- Montage is a system for building fast buffered persistent data structures on nonvolatile memory.☆15Updated 3 years ago
- A fast in-memory key-value store☆49Updated 7 years ago
- ☆15Updated 5 years ago
- Sample program for article "SIMD-ized searching in unique constant dictionary" (http://0x80.pl/articles/simd-search.html)☆52Updated 8 years ago
- A fast and accurate reuse distance analyzer for multi-threaded applications. It leverages existing hardware features in commodity CPUs.☆19Updated 2 years ago
- The Farm-SVE package provides a header that implements the ARM C language extensions (ACLE) for the ARM Scalable Vector Extension (SVE) i…☆14Updated last year
- ☆29Updated this week