fengChenHPC / kmeans_cuda
A high performance implementation of kmeans algorithm with cuda
☆18Updated 10 years ago
Alternatives and similar repositories for kmeans_cuda:
Users that are interested in kmeans_cuda are comparing it to the libraries listed below
- A CUDA implementation of the PageRank Pipeline Benchmark☆32Updated 8 years ago
- A minimalistic header only C++11 Neural Network library based on Eigen::Tensor☆20Updated 7 years ago
- Dolphin - a Deep Learning on MIC architecture Project.☆25Updated 10 years ago
- A GPU-based LZSS compression algorithm, highly tuned for NVIDIA GPGPUs and for streaming data, leveraging the respective strengths of CPU…☆35Updated 9 years ago
- Efficient LDA solution on GPUs.☆24Updated 6 years ago
- This is a c++ implementation of an LSTM Neural Network parallelized for a GPU using CUDA☆23Updated 7 years ago
- Fork of magma to include more BLAS☆28Updated 8 years ago
- High-Performance Streaming Graph Analytics on GPUs☆31Updated 6 years ago
- a heterogeneous multiGPU level-3 BLAS library☆45Updated 5 years ago
- CUDA Matrix Factorization Library with Stochastic Gradient Descent (SGD)☆71Updated 7 years ago
- Deep neural network framework (C/C++/CUDA).☆31Updated 9 years ago
- Test winograd convolution written in TVM for CUDA and AMDGPU☆40Updated 6 years ago
- Sparse matrix computation library for GPU☆54Updated 4 years ago
- Simple and Cutting-edge Deep Learning Library accelerated with GPU using C++ AMP☆19Updated 8 years ago
- ICML2017 MEC: Memory-efficient Convolution for Deep Neural Network C++实现(非官方)☆17Updated 5 years ago
- kmeans☆54Updated 8 years ago
- Convolutional Neural Network using Eigen and C++☆20Updated 9 years ago
- CUDA Sparse-Matrix Vector Multiplication, using Sliced Coordinate format☆21Updated 6 years ago
- HogWild++: A New Mechanism for Decentralized Asynchronous Stochastic Gradient Descent☆33Updated 8 years ago
- Communication-Minimizing 2D Convolution in GPU Registers☆30Updated 11 years ago
- Different implementation of sparse matrix multiplication. All matrices are in CSR format. The code contains different CUDA kernels for mu…☆16Updated 14 years ago
- Medusa: Building GPU-based Parallel Sparse Graph Applications with Sequential C/C++ Code☆61Updated 4 years ago
- Generating Families of Practical Fast Matrix Multiplication Algorithms☆12Updated 7 years ago
- CuSha is a CUDA-based vertex-centric graph processing framework that uses G-Shards and CW representations.☆52Updated 9 years ago
- TTC: A high-performance Compiler for Tensor Transpositions☆20Updated 7 years ago
- A platform for distributed optimization expriments using OpenMPI☆20Updated 7 years ago
- An Architecture-level Fault Injection Tool for GPU Application Resilience Evaluations☆16Updated 4 years ago
- CNNs in Halide☆23Updated 9 years ago
- Asynchronous Multi-GPU Programming Framework☆45Updated 3 years ago
- Asynchronous Stochastic Gradient Descent with Delay Compensation☆21Updated 7 years ago