krulis-martin / cuda-kmeans
A novell, highly-optimized CUDA implementation of k-means algorithm.
☆35Updated 3 years ago
Alternatives and similar repositories for cuda-kmeans:
Users that are interested in cuda-kmeans are comparing it to the libraries listed below
- A warp-oriented dynamic hash table for GPUs☆73Updated last year
- BGHT: High-performance static GPU hash tables.☆63Updated 2 weeks ago
- Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018☆71Updated 4 years ago
- ☆71Updated 3 years ago
- SparseTIR: Sparse Tensor Compiler for Deep Learning☆135Updated 2 years ago
- Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"☆88Updated last year
- PyTorch-Based Fast and Efficient Processing for Various Machine Learning Applications with Diverse Sparsity☆108Updated 3 weeks ago
- Codes of the paper "Speeding Up Set Intersections in Graph Algorithms using SIMD Instructions" that was published in SIGMOD 2018. Authors…☆30Updated 6 years ago
- A tool for examining GPU scheduling behavior.☆81Updated 8 months ago
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆32Updated 4 years ago
- ☆39Updated 3 years ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆82Updated this week
- A Library for fast Hash Tables on GPUs☆115Updated 2 years ago
- cuDNN sample codes provided by Nvidia☆45Updated 6 years ago
- Sparse-dense matrix-matrix multiplication on GPUs☆14Updated 6 years ago
- End to End steps for adding custom ops in PyTorch.☆21Updated 4 years ago
- CUDA Matrix Multiplication Optimization☆181Updated 9 months ago
- TLB Benchmarks☆33Updated 7 years ago
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …☆181Updated 2 months ago
- Artifacts of EVT ASPLOS'24☆23Updated last year
- [EuroSys'24] Minuet: Accelerating 3D Sparse Convolutions on GPUs☆75Updated 10 months ago
- Efficient Top-K implementation on the GPU☆176Updated 6 years ago
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.☆86Updated 2 years ago
- A Winograd Minimal Filter Implementation in CUDA☆24Updated 3 years ago
- a CUDA implementation of a priority queue☆84Updated 4 years ago
- IMPACT GPU Algorithms Teaching Labs☆57Updated 2 years ago
- Examples from Programming in Parallel with CUDA☆134Updated 2 years ago
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆61Updated 7 months ago
- Implementation of the maximum network flow problem in CUDA.☆32Updated 4 years ago
- TopK Algorithms Benchmark☆10Updated 5 years ago