cuMF / cumf_sgd
CUDA Matrix Factorization Library with Stochastic Gradient Descent (SGD)
☆71Updated 7 years ago
Alternatives and similar repositories for cumf_sgd:
Users that are interested in cumf_sgd are comparing it to the libraries listed below
- CUDA Matrix Factorization Library with Alternating Least Square (ALS)☆177Updated 6 years ago
- HogWild++: A New Mechanism for Decentralized Asynchronous Stochastic Gradient Descent☆33Updated 8 years ago
- GPU-specialized parameter server for GPU machine learning.☆101Updated 7 years ago
- MPI for Torch☆60Updated 7 years ago
- FRED simulator and associated paper☆26Updated 9 years ago
- Efficient LDA solution on GPUs.☆24Updated 6 years ago
- Machine Learning Toolkit for Extreme Scale (MaTEx)☆110Updated 6 years ago
- ☆30Updated 7 years ago
- A platform for distributed optimization expriments using OpenMPI☆21Updated 7 years ago
- cache-friendly multithread matrix factorization☆88Updated 8 years ago
- A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory☆297Updated 6 years ago
- The Surprisingly ParalleL spArse Tensor Toolkit.☆71Updated 3 years ago
- Cyclades☆28Updated 7 years ago
- Random Walk (Personalized PageRank) Algorithms for Large Graphs☆73Updated 9 years ago
- Training deep neural networks with low precision multiplications☆63Updated 9 years ago
- LR、FM model solved by ftrl and sgd parallel on MPI☆111Updated 7 years ago
- CUDA implementation of k-means☆23Updated 11 years ago
- Kernel Fusion and Runtime Compilation Based on NNVM☆70Updated 8 years ago
- a heterogeneous multiGPU level-3 BLAS library☆45Updated 5 years ago
- (Spring 2017) Assignment 2: GPU Executor☆62Updated 7 years ago
- A implementation of CF-NADE. Yin Zheng, et. al. "A Neural Autoregressive Approach to Collaborative Filtering", accepted by ICML 2016.☆79Updated 6 years ago
- ☆19Updated 7 years ago
- Light-weight GPU kernel interface for graph operations☆15Updated 4 years ago
- This repository contains the cuStinger data structure used for dynamic graph representation.☆19Updated 6 years ago
- Proof of concept prototype to perform distributed training using BVLC/caffe, based on a parameter server implementation using MPI. Data p…☆13Updated 9 years ago
- High Efficiency Convolution Kernel for Maxwell GPU Architecture☆134Updated 7 years ago
- Parallel Gradient Boosting Decision Trees☆21Updated 8 years ago
- cuda implementation of CBOW model (word2vec)☆117Updated 11 years ago
- A light-weight matrix factorization tool☆39Updated 7 years ago
- weighted deepwalk implementation in c++☆18Updated 8 years ago