SC-SGS / Distributed_GPU_LSH_using_SYCL
Distributed k-nearest Neighbors using Locality Sensitive Hashing and SYCL
☆10Updated 3 years ago
Alternatives and similar repositories for Distributed_GPU_LSH_using_SYCL:
Users that are interested in Distributed_GPU_LSH_using_SYCL are comparing it to the libraries listed below
- Parallel selection on GPUs☆15Updated 4 years ago
- ☆58Updated 7 months ago
- A warp-oriented dynamic hash table for GPUs☆73Updated last year
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆53Updated 3 weeks ago
- Data Parallel Extension for NumPy☆104Updated this week
- Fast and full-featured Matrix Market I/O library for C++, Python, and R☆76Updated 7 months ago
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆42Updated this week
- MagmaDNN: a simple deep learning framework in c++☆50Updated 4 years ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆57Updated last week
- CUDA kernel author's tools☆111Updated 2 years ago
- BGHT: High-performance static GPU hash tables.☆62Updated 6 months ago
- 🎃 GPU load-balancing library for regular and irregular computations.☆62Updated 9 months ago
- CUDA Template Functions☆19Updated 3 months ago
- ☆38Updated 3 years ago
- A unified framework across multiple programming platforms☆36Updated 9 months ago
- An extension library of WMMA API (Tensor Core API)☆93Updated 8 months ago
- A GPU accelerated error-bounded lossy compression for scientific data.☆73Updated 2 weeks ago
- Python SYCL bindings and SYCL-based Python Array API library☆110Updated this week
- Serial and parallel implementations of matrix multiplication☆40Updated 4 years ago
- Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019☆55Updated 2 years ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆31Updated 3 months ago
- Highly parallel DBSCAN (HPDBSCAN)☆43Updated 6 months ago
- Distributed Communication-Optimal LU-factorization Algorithm☆12Updated 3 years ago
- Efficient SpGEMM on GPU using CUDA and CSR☆52Updated last year
- ☆32Updated 4 years ago
- Sparse matrix computation library for GPU☆54Updated 4 years ago
- The CUDA target for Numba☆91Updated this week
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆51Updated last month
- Subset of BLAS routines optimized for NVIDIA GPUs☆68Updated 2 years ago