ignatij / knn-mpi
Parallel implementation of kNN using MPI
☆17Updated 2 years ago
Alternatives and similar repositories for knn-mpi:
Users that are interested in knn-mpi are comparing it to the libraries listed below
- Parallel implementation of Graph Convolutional Networks on CPU☆17Updated 5 years ago
- Further development has been moved to a new repository https://github.com/wangyiqiu/dbscan-python☆18Updated 2 years ago
- A Sparse-tensor Communication Framework for Distributed Deep Learning☆13Updated 3 years ago
- Asynchronous Multi-GPU Programming Framework☆46Updated 3 years ago
- Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019☆55Updated 2 years ago
- Introduction to CUDA programming and debugging☆13Updated 2 years ago
- iBFS: Concurrent Breadth-First Search on GPUs. SIGMOD'16☆24Updated 7 years ago
- Arya: Arbitrary Graph Pattern Mining with Decomposition-based Sampling☆13Updated last year
- RLib is a header-only library for easier usage of RDMA.☆45Updated 4 years ago
- PetPS: Supporting Huge Embedding Models with Tiered Memory☆30Updated 11 months ago
- A Fast Parallel Algorithm for HDBSCAN* Clustering☆58Updated 2 years ago
- Simple PyTorch profiler that combines DeepSpeed Flops Profiler and TorchInfo☆11Updated 2 years ago
- A novell, highly-optimized CUDA implementation of k-means algorithm.☆35Updated 3 years ago
- Parallel Approximate Nearest Neighbor Search☆13Updated 2 years ago
- A Framework for Graph Sampling and Random Walk on GPUs.☆39Updated 2 months ago
- A warp-oriented dynamic hash table for GPUs☆73Updated last year
- SIMD-X: Programming and Processing of Graph Algorithms on GPUs [USENIX ATC '19]☆20Updated 4 years ago
- TopK Algorithms Benchmark☆10Updated 5 years ago
- A fully adaptive, zero-tuning parameter manager that enables efficient distributed machine learning training☆20Updated 2 years ago
- This is the implementation repository of our OSDI'23 paper: SMART: A High-Performance Adaptive Radix Tree for Disaggregated Memory.☆59Updated 5 months ago
- Reading seminar in Harvard Cloud Networking and Systems Group☆16Updated 2 years ago
- General system research material (not limited to paper) reading notes.☆21Updated 4 years ago
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆19Updated last year
- 知乎文章附带代码☆13Updated 2 years ago
- website for systems seminar at UIUC☆17Updated last week
- Multi-Instance-GPU profiling tool☆57Updated 2 years ago
- Abstractions of memory, allocator, vector, tuple, shared_ptr, unique_ptr, bitset, variant and string working on both CPU and GPU☆30Updated 2 weeks ago
- An Attention Superoptimizer☆21Updated 3 months ago
- ☆10Updated 3 years ago
- A computation-parallel deep learning architecture.☆13Updated 5 years ago