facebookincubator / gloo

Collective communications library with various primitives for multi-machine training.

☆1,288

Alternatives and similar repositories for gloo:

Users that are interested in gloo are comparing it to the libraries listed below

baidu-research / baidu-allreduce
☆580Updated 7 years ago
NVIDIA / nccl
Optimized primitives for collective multi-GPU communication
☆3,641Updated 2 weeks ago
pytorch / FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
☆1,293Updated this week
dmlc / dlpack
common in-memory tensor structure
☆974Updated this week
mlcommons / training
Reference implementations of MLPerf™ training benchmarks
☆1,658Updated this week
baidu-research / DeepBench
Benchmarking Deep Learning operations on different hardware
☆1,082Updated 3 years ago
facebookresearch / TensorComprehensions
A domain specific language to express machine learning workloads.
☆1,759Updated last year
pytorch / tensorpipe
A tensor-aware point-to-point communication primitive for machine learning
☆256Updated 2 years ago
NVIDIA / nccl-tests
NCCL Tests
☆1,059Updated 3 weeks ago
zdevito / ATen
ATen: A TENsor library for C++11
☆695Updated 5 years ago
pytorch / tvm
TVM integration into PyTorch
☆452Updated 5 years ago
dmlc / nnvm
☆1,659Updated 6 years ago
microsoft / nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
☆982Updated 6 months ago
baidu-research / tensorflow-allreduce
☆372Updated 7 years ago
NervanaSystems / ngraph
nGraph has moved to OpenVINO
☆1,350Updated 4 years ago
jiazhihao / TASO
The Tensor Algebra SuperOptimizer for Deep Learning
☆705Updated 2 years ago
openai / blocksparse
Efficient GPU kernels for block-sparse matrix multiplication and convolution
☆1,038Updated last year
NVIDIA / cnmem
A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory
☆297Updated 6 years ago
tensorflow / runtime
A performant and modular runtime for TensorFlow
☆759Updated last month
NVIDIA / gdrcopy
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
☆1,030Updated 2 weeks ago
google / gemmlowp
Low-precision matrix multiplication
☆1,798Updated last year
NVIDIA / cub
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
☆1,742Updated last year
pytorch / extension-cpp
C++ extensions in PyTorch
☆1,079Updated 2 months ago
pytorch / elastic
PyTorch elastic training
☆730Updated 2 years ago
dmlc / dmlc-core
A common bricks library for building scalable and portable distributed machine learning.
☆869Updated this week
tensorflow / benchmarks
A benchmark framework for Tensorflow
☆1,151Updated last year
msr-fiddle / pipedream
☆390Updated 2 years ago
pytorch / benchmark
TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance.
☆932Updated this week
pytorch / kineto
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
☆794Updated this week
NVIDIA / aistore
AIStore: scalable storage for AI applications
☆1,460Updated this week