cuMF / cumf_sgd
CUDA Matrix Factorization Library with Stochastic Gradient Descent (SGD)
☆71Updated 7 years ago
Alternatives and similar repositories for cumf_sgd:
Users that are interested in cumf_sgd are comparing it to the libraries listed below
- CUDA Matrix Factorization Library with Alternating Least Square (ALS)☆176Updated 6 years ago
- Efficient LDA solution on GPUs.☆24Updated 6 years ago
- Machine Learning Toolkit for Extreme Scale (MaTEx)☆111Updated 6 years ago
- Random Walk (Personalized PageRank) Algorithms for Large Graphs☆73Updated 9 years ago
- GPU-specialized parameter server for GPU machine learning.☆100Updated 6 years ago
- Cyclades☆28Updated 6 years ago
- HogWild++: A New Mechanism for Decentralized Asynchronous Stochastic Gradient Descent☆33Updated 8 years ago
- Light-weight GPU kernel interface for graph operations☆15Updated 4 years ago
- ☆30Updated 7 years ago
- The Surprisingly ParalleL spArse Tensor Toolkit.☆70Updated 3 years ago
- MPI for Torch☆60Updated 7 years ago
- a heterogeneous multiGPU level-3 BLAS library☆45Updated 5 years ago
- SRS - Fast Approximate Nearest Neighbor Search in High Dimensional Euclidean Space With a Tiny Index☆55Updated 9 years ago
- FRED simulator and associated paper☆26Updated 9 years ago
- Different implementation of sparse matrix multiplication. All matrices are in CSR format. The code contains different CUDA kernels for mu…☆16Updated 14 years ago
- (Spring 2017) Assignment 2: GPU Executor☆62Updated 7 years ago
- Kernel Fusion and Runtime Compilation Based on NNVM☆70Updated 8 years ago
- Matrix Shadow:Lightweight CPU/GPU Matrix and Tensor Template Library in C++/CUDA for (Deep) Machine Learning☆33Updated 8 years ago
- GraphMat graph analytics framework☆101Updated 2 years ago
- Training deep neural networks with low precision multiplications☆63Updated 9 years ago
- LSH-GPU ANN package☆93Updated 5 years ago
- cache-friendly multithread matrix factorization☆88Updated 8 years ago
- A platform for distributed optimization expriments using OpenMPI☆21Updated 7 years ago
- ☆127Updated 8 years ago
- ☆47Updated 5 years ago
- weighted deepwalk implementation in c++☆18Updated 8 years ago
- Test winograd convolution written in TVM for CUDA and AMDGPU☆40Updated 6 years ago
- Graph Challenge☆31Updated 5 years ago
- Proof of concept prototype to perform distributed training using BVLC/caffe, based on a parameter server implementation using MPI. Data p…☆13Updated 9 years ago
- A distributed logistic regression system based on ps-lite.☆45Updated 8 years ago