NVIDIA / kmeans
kmeans clustering with multi-GPU capabilities
☆120Updated 2 years ago
Alternatives and similar repositories for kmeans:
Users that are interested in kmeans are comparing it to the libraries listed below
- Efficient Top-K implementation on the GPU☆175Updated 6 years ago
- A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory☆297Updated 6 years ago
- Python bindings for NVTX☆66Updated last year
- Kernel Fusion and Runtime Compilation Based on NNVM☆70Updated 8 years ago
- Optimized half precision gemm assembly kernels (deprecated due to ROCm)☆47Updated 7 years ago
- Some CUDA design patterns and a bit of template magic for CUDA☆150Updated last year
- Simple example of implementing a new Tensorflow operation and its gradient in C++.☆56Updated 6 years ago
- A warp-oriented dynamic hash table for GPUs☆73Updated last year
- flexible-gemm conv of deepcore☆17Updated 5 years ago
- Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm☆34Updated 5 years ago
- Full-speed Array of Structures access☆169Updated last year
- Codebase associated with the PyTorch compiler tutorial☆45Updated 5 years ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆84Updated last year
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- CNNs in Halide☆23Updated 9 years ago
- TensorFlow and TVM integration☆37Updated 4 years ago
- This repository contains the results and code for the MLPerf™ Training v0.5 benchmark.☆35Updated last year
- Introduction to CUDA programming☆116Updated 7 years ago
- CUDA Data Parallel Primitives Library☆429Updated 6 years ago
- ☆22Updated 7 years ago
- CUDA Tensor Transpose (cuTT) library☆51Updated 7 years ago
- Tools and extensions for CUDA profiling☆65Updated 5 years ago
- Example of how to use CUDA with CMake >= 3.8☆69Updated last year
- CUDA by practice☆125Updated 5 years ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆65Updated 3 years ago
- ☆127Updated 7 years ago
- A CUDNN minimal deep learning training code sample using LeNet.☆264Updated last year
- GGNN: State of the Art Graph-based GPU Nearest Neighbor Search☆154Updated 2 months ago
- GPU-specialized parameter server for GPU machine learning.☆101Updated 7 years ago
- Greentea LibDNN - a universal convolution implementation supporting CUDA and OpenCL☆135Updated 7 years ago