NVIDIA / kmeans
kmeans clustering with multi-GPU capabilities
☆120Updated last year
Alternatives and similar repositories for kmeans:
Users that are interested in kmeans are comparing it to the libraries listed below
- A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory☆297Updated 6 years ago
- Efficient Top-K implementation on the GPU☆155Updated 5 years ago
- Python bindings for NVTX☆66Updated last year
- Kernel Fusion and Runtime Compilation Based on NNVM☆70Updated 8 years ago
- GPU-specialized parameter server for GPU machine learning.☆100Updated 6 years ago
- CUDA Data Parallel Primitives Library☆428Updated 6 years ago
- Some CUDA design patterns and a bit of template magic for CUDA☆149Updated last year
- ☆66Updated 11 years ago
- ☆21Updated 7 years ago
- flexible-gemm conv of deepcore☆17Updated 5 years ago
- kmeans☆54Updated 8 years ago
- A way to use cuda to accelerate top k algorithm☆29Updated 7 years ago
- GPU-based large scale Approx. Nearest Neighbor Search, accepted at CVPR 2016☆91Updated 6 years ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆84Updated last year
- Open single and half precision gemm implementations☆377Updated last year
- Greentea LibDNN - a universal convolution implementation supporting CUDA and OpenCL☆135Updated 7 years ago
- This repository contains the results and code for the MLPerf™ Training v0.5 benchmark.☆35Updated last year
- Deep Learning/GPU Architect/Autonomous Driving Positions☆80Updated 5 years ago
- A warp-oriented dynamic hash table for GPUs☆74Updated last year
- Example of how to use CUDA with CMake >= 3.8☆69Updated last year
- ☆127Updated 7 years ago
- A CUDNN minimal deep learning training code sample using LeNet.☆264Updated last year
- Simple example of implementing a new Tensorflow operation and its gradient in C++.☆56Updated 5 years ago
- TensorFlow and TVM integration☆37Updated 4 years ago
- CNN accelerated by cuda. Test on mnist and finilly get 99.76%☆187Updated 7 years ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆63Updated 2 years ago
- oneCCL Bindings for Pytorch*☆89Updated last week
- CLTune: An automatic OpenCL & CUDA kernel tuner☆175Updated 2 years ago
- This repository contains the results and code for the MLPerf™ Training v0.7 benchmark.☆56Updated last year
- Optimized half precision gemm assembly kernels (deprecated due to ROCm)☆47Updated 7 years ago