baidu-research / baidu-allreduce
☆568Updated 6 years ago
Related projects ⓘ
Alternatives and complementary repositories for baidu-allreduce
- ☆374Updated 7 years ago
- Collective communications library with various primitives for multi-machine training.☆1,227Updated this week
- GPU-specialized parameter server for GPU machine learning.☆100Updated 6 years ago
- A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory☆291Updated 5 years ago
- Reliable Allreduce and Broadcast Interface for distributed machine learning☆507Updated 4 years ago
- A lightweight parameter server interface☆74Updated last year
- ☆378Updated 2 years ago
- Deep learning system course☆218Updated 5 years ago
- Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.☆263Updated last year
- PMLS-Caffe: Distributed Deep Learning Framework for Parallel ML System☆194Updated 6 years ago
- CS294; AI For Systems and Systems For AI☆221Updated 5 years ago
- A lightweight parameter server interface☆1,539Updated last year
- Documentation for StreamExecutor open source proposal☆83Updated 8 years ago
- ☆127Updated 6 years ago
- TVM integration into PyTorch☆452Updated 4 years ago
- A common bricks library for building scalable and portable distributed machine learning.☆865Updated 5 months ago
- High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.☆532Updated 2 years ago
- Guide for building custom op for TensorFlow☆378Updated last year
- Tutorial code on how to build your own Deep Learning System in 2k Lines☆126Updated 7 years ago
- tensorflow源码阅读笔记☆189Updated 6 years ago
- The Tensor Algebra SuperOptimizer for Deep Learning☆692Updated last year
- [Deprecated] The TensorFlow Profiler (TFProf) UI provides a visual interface for profiling TensorFlow models.☆136Updated 5 years ago
- HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training☆946Updated last month
- To make it easy to benchmark AI accelerators☆179Updated last year
- moved to https://github.com/dmlc/ps-lite☆649Updated 9 years ago
- A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster☆157Updated 7 months ago
- Dive into Deep Learning Compiler☆643Updated 2 years ago
- Matrix Shadow:Lightweight CPU/GPU Matrix and Tensor Template Library in C++/CUDA for (Deep) Machine Learning☆1,110Updated 5 years ago
- Open single and half precision gemm implementations☆374Updated last year
- Official code for "Writing Distributed Applications with PyTorch", PyTorch Tutorial☆255Updated last year