SC-SGS / Distributed_GPU_LSH_using_SYCLLinks
Distributed k-nearest Neighbors using Locality Sensitive Hashing and SYCL
☆10Updated 4 years ago
Alternatives and similar repositories for Distributed_GPU_LSH_using_SYCL
Users that are interested in Distributed_GPU_LSH_using_SYCL are comparing it to the libraries listed below
Sorting:
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆260Updated last year
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆95Updated 4 years ago
- Some CUDA design patterns and a bit of template magic for CUDA☆158Updated 2 years ago
- CUDA kernel author's tools☆115Updated 3 years ago
- A warp-oriented dynamic hash table for GPUs☆76Updated 2 years ago
- Python SYCL bindings and SYCL-based Python Array API library☆121Updated this week
- MagmaDNN: a simple deep learning framework in c++☆51Updated 5 years ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆111Updated 2 months ago
- Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!☆116Updated 6 months ago
- Abstraction Library for Parallel Kernel Acceleration☆401Updated last week
- Data Parallel Extension for NumPy☆121Updated this week
- Fast and full-featured Matrix Market I/O library for C++, Python, and R☆86Updated last year
- ☆59Updated 4 months ago
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆56Updated this week
- ☆44Updated this week
- Analyze graph/hierarchical performance data using pandas dataframes☆118Updated 3 months ago
- Worked example of the process from Python source to CUDA kernel execution with Numba☆45Updated last year
- The Combinatorial BLAS (CombBLAS) is an extensible distributed-memory parallel graph library offering a small but powerful set of linear …☆81Updated 5 months ago
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆212Updated this week
- Template for GPU accelerated python libraries☆51Updated 2 years ago
- A library to benchmark CUDA code, similar to google benchmark.☆30Updated 4 years ago
- The Foundation for All Legate Libraries☆233Updated last week
- NPBench - A Benchmarking Suite for High-Performance NumPy☆91Updated last week
- Subset of BLAS routines optimized for NVIDIA GPUs☆76Updated 2 years ago
- A Library for fast Hash Tables on GPUs☆132Updated 3 months ago
- Concurrent CPU-GPU Programming using Task Models☆106Updated 6 years ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated 2 years ago
- BLAS implementation for Intel FPGA☆78Updated 5 years ago
- CSR-based SpGEMM on nVidia and AMD GPUs☆46Updated 9 years ago
- Modular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python. The C++ template…☆366Updated last year