ignatij / knn-mpi
Parallel implementation of kNN using MPI
☆17Updated last year
Related projects: ⓘ
- A Skew-Resistant Index for Processing-in-Memory☆22Updated 9 months ago
- STREAMer: Benchmarking remote volatile and non-volatile memory bandwidth☆15Updated last year
- iBFS: Concurrent Breadth-First Search on GPUs. SIGMOD'16☆24Updated 7 years ago
- An Attention Superoptimizer☆19Updated 4 months ago
- Vector search with bounded performance.☆33Updated 7 months ago
- A warp-oriented dynamic hash table for GPUs☆70Updated 8 months ago
- Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019☆51Updated 2 years ago
- Abstractions of memory, allocator, vector, tuple, shared_ptr, unique_ptr, bitset, variant and string working on both CPU and GPU☆28Updated 3 weeks ago
- Arya: Arbitrary Graph Pattern Mining with Decomposition-based Sampling☆13Updated 11 months ago
- Simple PyTorch profiler that combines DeepSpeed Flops Profiler and TorchInfo☆9Updated last year
- ☆13Updated 2 years ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆17Updated 2 years ago
- Introduction to CUDA programming and debugging☆9Updated last year
- Cache Manager using Reinforcement Learning☆9Updated 4 years ago
- Asynchronous Multi-GPU Programming Framework☆45Updated 3 years ago
- Massively parallel DBSCAN algorithm implemented in CUDA along with a KD-Tree for searching neighbors.☆9Updated 3 years ago
- ☆10Updated last year
- ☆14Updated 2 years ago
- ☆20Updated 3 years ago
- GPU B-Tree with support for versioning (snapshots).☆39Updated 5 months ago
- Tiered Indexing is a general way to improve the memory utilization of buffer-managed data structures including B+tree, Hashing, Heap, and…☆26Updated 4 months ago
- A Sparse-tensor Communication Framework for Distributed Deep Learning☆13Updated 2 years ago
- ☆67Updated last year
- Repository holding the code base to AC-SpGEMM : "Adaptive Sparse Matrix-Matrix Multiplication on the GPU"☆28Updated 4 years ago
- Out-of-GPU-Memory Graph Processing with Minimal Data Transfer☆50Updated last year
- Multi-GPU dynamic scheduler using PGAS style cross-GPU communication☆27Updated last year
- A Fast Parallel Algorithm for HDBSCAN* Clustering☆53Updated last year
- ☆14Updated 3 years ago
- ☆19Updated last year
- SOTA Learning-augmented Systems☆32Updated 2 years ago