krulis-martin / cuda-kmeans
A novell, highly-optimized CUDA implementation of k-means algorithm.
☆32Updated 2 years ago
Alternatives and similar repositories for cuda-kmeans:
Users that are interested in cuda-kmeans are comparing it to the libraries listed below
- A warp-oriented dynamic hash table for GPUs☆72Updated last year
- [EuroSys'24] Minuet: Accelerating 3D Sparse Convolutions on GPUs☆75Updated 7 months ago
- Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"☆88Updated last year
- BGHT: High-performance static GPU hash tables.☆57Updated 4 months ago
- Implementation of the maximum network flow problem in CUDA.☆28Updated 4 years ago
- A Fast Parallel Algorithm for HDBSCAN* Clustering☆55Updated 2 years ago
- ☆10Updated last year
- Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018☆72Updated 4 years ago
- Abstractions of memory, allocator, vector, tuple, shared_ptr, unique_ptr, bitset, variant and string working on both CPU and GPU☆31Updated 2 weeks ago
- Near-storage compute aware file system and FPGA operator pipelines.☆29Updated 2 years ago
- the CPU implementation of bucket based farthest point sampling, achieves 7-81x speedup than the conventional implementation☆14Updated last year
- Arrow Matrix Decomposition - Communication-Efficient Distributed Sparse Matrix Multiplication☆15Updated 9 months ago
- Sparse-dense matrix-matrix multiplication on GPUs☆15Updated 6 years ago
- Codes of the paper "Speeding Up Set Intersections in Graph Algorithms using SIMD Instructions" that was published in SIGMOD 2018. Authors…☆27Updated 5 years ago
- cuDNN sample codes provided by Nvidia☆45Updated 5 years ago
- ☆150Updated last year
- GGNN: State of the Art Graph-based GPU Nearest Neighbor Search☆145Updated 3 years ago
- End to End steps for adding custom ops in PyTorch.☆19Updated 4 years ago
- TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.☆174Updated 2 months ago
- SparseP is the first open-source Sparse Matrix Vector Multiplication (SpMV) software package for real-world Processing-In-Memory (PIM) ar…☆70Updated 2 years ago
- study of Ampere' Sparse Matmul☆16Updated 4 years ago
- ☆38Updated 3 years ago
- An FPGA integration and acceleration of the popular FAISS framework for approximate similarity search☆23Updated 5 years ago
- SNIG: Accelerated Large Sparse Neural Network Inference using Task Graph Parallelism☆34Updated 3 years ago
- A GPU algorithm for sparse matrix-matrix multiplication☆67Updated 4 years ago
- Massively parallel DBSCAN algorithm implemented in CUDA along with a KD-Tree for searching neighbors.☆11Updated 4 years ago
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆19Updated last year
- ngAP's artifact for ASPLOS'24☆19Updated this week
- PTX-EMU is a simple emulator for CUDA program.☆26Updated last year
- Benchmark for measuring the performance of sparse and irregular memory access.☆76Updated this week