Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!
☆116Jul 22, 2025Updated 7 months ago
Alternatives and similar repositories for ParallelReductionsBenchmark
Users that are interested in ParallelReductionsBenchmark are comparing it to the libraries listed below
Sorting:
- Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake usin…☆31Oct 14, 2025Updated 4 months ago
- A simple cross-platform speed & memory-efficiency benchmark for the most common hash-table implementations in the C++ world☆12Dec 9, 2022Updated 3 years ago
- Link to this library and it will log all the LibC functions you are calling and how much time you are spending in them!☆22Jan 4, 2025Updated last year
- GPGPU array on Vulkan☆17Jun 3, 2023Updated 2 years ago
- NetworkX-like Python experience for Postgres, SQLite, MongoDB, and Neo4J☆30Feb 28, 2025Updated last year
- My notes on various HPC papers.☆26Jan 7, 2023Updated 3 years ago
- Runs a single CUDA/OpenCL kernel, taking its source from a file and arguments from the command-line☆24Nov 25, 2025Updated 3 months ago
- WIP · CUDA compatibility for Blaze · https://bitbucket.org/blaze-lib/blaze☆21Nov 18, 2019Updated 6 years ago
- SPar is an internal DSL for high-level stream parallelism☆10Aug 2, 2020Updated 5 years ago
- Concurrent CPU-GPU Programming using Task Models☆106Dec 19, 2019Updated 6 years ago
- gnucap mirror (read only)☆31Feb 7, 2026Updated 3 weeks ago
- A Halide backend for ONNX☆12Nov 5, 2019Updated 6 years ago
- Experimental ranges for CUDA☆25Feb 1, 2019Updated 7 years ago
- Collection of samples and utilities for using ComputeCpp, Codeplay's SYCL implementation☆325Aug 11, 2023Updated 2 years ago
- Wait-Free Eras (PPoPP '20)☆10Jan 11, 2020Updated 6 years ago
- A wrapper around Python's ctypes for Nim-specific function signatures.☆12Dec 12, 2017Updated 8 years ago
- Meet the C++ and Systems Design Group of Armenia!☆13Oct 27, 2024Updated last year
- This repository contains some tools to monitor the UNC_CBO_CACHE_LOOKUP event of the C-Boxes.☆12Oct 11, 2017Updated 8 years ago
- ☆13Jan 19, 2017Updated 9 years ago
- Awesome implicit data structures☆24Oct 6, 2019Updated 6 years ago
- Massively parallel DBSCAN algorithm implemented in CUDA along with a KD-Tree for searching neighbors.☆13Sep 21, 2020Updated 5 years ago
- Case Studies for Halide performance against C++ and OpenCL☆36Oct 9, 2013Updated 12 years ago
- Tries in C++☆13Aug 20, 2020Updated 5 years ago
- Apache ORC - the smallest, fastest columnar storage for Hadoop workloads☆16Feb 18, 2026Updated last week
- ILP SAT Detailed Router☆13Apr 14, 2020Updated 5 years ago
- FastFlow pattern-based parallel programming framework (formerly on sourceforge)☆300Feb 11, 2026Updated 2 weeks ago
- parsertl: The Modular Parser Generator☆16Aug 24, 2025Updated 6 months ago
- ⏱ Superfast ^Advanced wildcards++? | Unique algorithms that was implemented on native unmanaged C++ but easily accessible in .NET via Con…☆28Jul 18, 2021Updated 4 years ago
- Abstraction Library for Parallel Kernel Acceleration☆407Feb 17, 2026Updated last week
- YOLOv8 / YOLOv11 models and code for CG / art image processing☆20Aug 25, 2025Updated 6 months ago
- portability macros for compiler and hardware micro operations☆37Jul 1, 2024Updated last year
- A streamlined CMake build system foundation for developing HPC software☆284Feb 20, 2026Updated last week
- A modern Next-Generation Firewall application built with Rust, featuring a web-based dashboard for network security management.☆27Jul 2, 2025Updated 7 months ago
- Dataframe for Nim☆15Jul 25, 2019Updated 6 years ago
- outline and links for PLDI 2022 tutorial☆17Jun 13, 2022Updated 3 years ago
- C++ code for RingQueue, SharedMemory and Semaphore☆19Aug 29, 2020Updated 5 years ago
- Utility programs to pipe data across a RDMA-capable network☆19Feb 7, 2026Updated 3 weeks ago
- Pybind11 tool for making docstrings from C++ comments☆44Jan 15, 2026Updated last month
- Optimizing bit-level Jaccard Index and Population Counts for large-scale quantized Vector Search via Harley-Seal CSA and Lookup Tables☆21May 18, 2025Updated 9 months ago