lattice-land / cuda-battery
Abstractions of memory, allocator, vector, tuple, shared_ptr, unique_ptr, bitset, variant and string working on both CPU and GPU
☆30Updated last month
Related projects ⓘ
Alternatives and complementary repositories for cuda-battery
- An expression template based linear algebra library running completely on the GPU using CUDA☆22Updated 3 years ago
- SuiteSparse: a suite of sparse matrix packages by @DrTimothyAldenDavis et al. with native CMake support☆52Updated 4 months ago
- High-Performance Computing: CPU Instructions, GPU OpenCL & CUDA, etc.☆14Updated 6 months ago
- vectorization of the kd-tree data structure and search algorithm☆37Updated 6 years ago
- Parallel Tasking Library (PTL) - Lightweight C++11 mutilthreading tasking system featuring thread-pool, task-groups, and lock-free task q…☆43Updated last week
- Light and self-contained implementation of C++17 parallel algorithms.☆32Updated this week
- A simple and fast library allowing to run async tasks and execute task graphs.☆42Updated last month
- LEMON Graph Library☆32Updated 4 years ago
- BGHT: High-performance static GPU hash tables.☆55Updated 2 months ago
- Fast and full-featured Matrix Market I/O library for C++, Python, and R☆75Updated 3 months ago
- A Collection of Parallel Algorithms for Computational Geometry☆12Updated 2 years ago
- WIP · CUDA compatibility for Blaze · https://bitbucket.org/blaze-lib/blaze☆17Updated 5 years ago
- GPU B-Tree with support for versioning (snapshots).☆44Updated 3 weeks ago
- a CUDA implementation of a priority queue☆81Updated 4 years ago
- Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceler…☆27Updated 4 months ago
- Boost.org graph_parallel module☆27Updated this week
- Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019☆52Updated 2 years ago
- A library to benchmark CUDA code, similar to google benchmark.☆28Updated 3 years ago
- A collection of min-cut/max-flow algorithms.☆38Updated 2 years ago
- A Nonlinear Least Squares Minimizer☆34Updated 12 years ago
- C++ Header-Only Library for High-Performance Tensor-Vector Multiplication☆19Updated this week
- Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"☆82Updated last year
- Some CUDA design patterns and a bit of template magic for CUDA☆146Updated last year
- Learn OpenMP examples step by step☆86Updated 3 years ago
- Thrust, CUB, TBB, AVX2, CUDA, OpenCL, OpenMP, SyCL - all it takes to sum a lot of numbers fast!☆73Updated 6 months ago
- Sympiler is a Code Generator for Transforming Sparse Matrix Codes☆42Updated last year
- A Minimalistic Auto-Diff Optimization Framework for Teaching and Understanding Pytorch☆16Updated last week
- SymPP: A Symbolic Library that compiles itself☆13Updated 3 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆43Updated 10 months ago
- Parallel Graph Input Output☆17Updated last year