sleeepyjack / warpdriveLinks
☆13Updated 3 years ago
Alternatives and similar repositories for warpdrive
Users that are interested in warpdrive are comparing it to the libraries listed below
Sorting:
- Full-speed Array of Structures access☆173Updated 2 years ago
- The Berkeley Container Library☆124Updated 2 years ago
- A Library for fast Hash Tables on GPUs☆126Updated last week
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆72Updated 9 years ago
- GPUfs - File system support for NVIDIA GPUs☆97Updated 6 years ago
- A fast and highly scalable GPU dynamic memory allocator☆109Updated 10 years ago
- A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory☆298Updated 6 years ago
- OFI Programmer's Guide☆53Updated 2 years ago
- ☆21Updated 4 years ago
- CUDA and OpenMP implementations of C2R/R2C inplace transposition☆48Updated 10 years ago
- Tools and extensions for CUDA profiling☆64Updated 5 years ago
- Intel Heterogeneous Research Compiler (iHRC)☆25Updated 2 years ago
- Cooperative Primitives for CUDA C++ Kernel Authors. This repository contains CUB PRs from Q4 2019 until Q4 2020.☆22Updated 5 years ago
- gossip: Efficient Communication Primitives for Multi-GPU Systems☆59Updated 3 years ago
- GPU Optimization and Memory Abstraction Framework☆32Updated 5 years ago
- Memory system characterization benchmarks using atomic operations☆14Updated last year
- A benchmark of some prominent C/C++ hash table implementations☆105Updated 6 years ago
- Giddy - A lightweight GPU decompression library☆44Updated 6 years ago
- This is a header only library offering a variety of dynamically growing concurrent hash tables. That all work by dynamically migrating th…☆113Updated 9 months ago
- A framework that helps implementing swizzle GPU kernels☆42Updated 5 years ago
- GraphBLAS Template Library (GBTL): C++ graph algorithms and primitives using semiring algebra as defined at graphblas.org☆136Updated 2 years ago
- GraphMat graph analytics framework☆102Updated 2 years ago
- A Distributed Multi-GPU System for Fast Graph Processing☆65Updated 6 years ago
- ☆32Updated 5 years ago
- Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019☆57Updated 3 years ago
- sparse matrix pre-processing library☆83Updated last year
- pytorch ucc plugin☆23Updated 4 years ago
- ☆74Updated 2 years ago
- A Deep Learning Meta-Framework and HPC Benchmarking Library☆81Updated 3 years ago
- Use CUDA intrinsics with user-defined types☆48Updated 11 years ago