NVIDIA / Bobber

Containerized testing of system components that impact AI workload performance

☆14

Alternatives and similar repositories for Bobber:

Users that are interested in Bobber are comparing it to the libraries listed below

HewlettPackard / dlcookbook-dlbs
Deep Learning Benchmarking Suite
☆130Updated 2 years ago
NVIDIA / Imageinary
Imageinary is a reproducible mechanism which is used to generate large image datasets at various resolutions. The tool supports multiple …
☆26Updated last year
rapidsai / ucx-py
Python bindings for UCX
☆123Updated this week
rapidsai / gpu-bdb
RAPIDS GPU-BDB
☆108Updated 10 months ago
tensorflow / networking
Enhanced networking support for TensorFlow. Maintained by SIG-networking.
☆98Updated 3 years ago
onnx / backend-scoreboard
Scoreboard for ONNX Backend Compatibility
☆27Updated this week
dholt / slurm-gpu
Scheduling GPU cluster workloads with Slurm
☆74Updated 6 years ago
aws / aws-ofi-nccl
This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.
☆160Updated this week
NVIDIA / nephele
Tools to deploy GPU clusters in the Cloud
☆30Updated last year
NVIDIA / DALI_deps
3rd party dependencies for DALI project
☆10Updated 2 weeks ago
NVIDIA / ais-k8s
Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.
☆88Updated last week
uxlfoundation / oneCCL
oneAPI Collective Communications Library (oneCCL)
☆218Updated last week
NVIDIA / ngc-container-replicator
NGC Container Replicator
☆28Updated 2 years ago
run-ai / rntop
A top-like tool for monitoring GPUs in a cluster
☆84Updated 11 months ago
NVIDIA / mlperf-common
NVIDIA's launch, startup, and logging scripts used by our MLPerf Training and HPC submissions
☆24Updated 3 weeks ago
ryantd / veloce
WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.
☆18Updated 2 years ago
mlcommons / logging
MLPerf™ logging library
☆32Updated 3 weeks ago
mkuchnik / PlumberApp
Repository to go along with the paper "Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines"
☆9Updated 2 years ago
google / nccl-fastsocket
NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.
☆114Updated last year
rapidsai / ucxx
☆24Updated this week
mlcommons / mlcube
MLCube® is a project that reduces friction for machine learning by ensuring that models are easily portable and reproducible.
☆155Updated 4 months ago
ray-project / ray_shuffling_data_loader
A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed tra…
☆18Updated 2 years ago
intel / torch-ccl
oneCCL Bindings for Pytorch*
☆87Updated 3 weeks ago
sylabs / wlm-operator
Singularity implementation of k8s operator for interacting with SLURM.
☆117Updated 4 years ago
triton-inference-server / pytorch_backend
The Triton backend for the PyTorch TorchScript models.
☆141Updated last week
RedisAI / aibench
AIBench, a tool for comparing and evaluating AI serving solutions. forked from [tsbs](https://github.com/timescale/tsbs) and adapted to A…
☆20Updated 4 months ago
converged-computing / slurm-operator
Testing if I can implement slurm in an operator
☆14Updated 2 months ago
mlcommons / hpc
Reference implementations of MLPerf™ HPC training benchmarks
☆45Updated 8 months ago
NVIDIA / LDDL
Distributed preprocessing and data loading for language datasets
☆39Updated 9 months ago
NVIDIA / nvtx-plugins
Python bindings for NVTX
☆66Updated last year