NVIDIA / Bobber
Containerized testing of system components that impact AI workload performance
☆15Updated last year
Related projects: ⓘ
- Test data for DALI project☆39Updated 3 weeks ago
- NGC Container Replicator☆28Updated last year
- Provides for deploying custom ETL containers on AIStore, with subsequent user-defined extraction-transformation-loading in parallel, on t…☆14Updated 5 months ago
- Python bindings for UCX☆120Updated this week
- Tools to deploy GPU clusters in the Cloud☆30Updated last year
- Benchmarks to capture important workloads.☆28Updated 3 months ago
- Imageinary is a reproducible mechanism which is used to generate large image datasets at various resolutions. The tool supports multiple …☆26Updated last year
- A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed tra…☆18Updated last year
- MLFlow Deployment Plugin for Ray Serve☆41Updated 2 years ago
- Deep Learning Benchmarking Suite☆130Updated last year
- AIBench, a tool for comparing and evaluating AI serving solutions. forked from [tsbs](https://github.com/timescale/tsbs) and adapted to A…☆20Updated 2 weeks ago
- MLCube® is a project that reduces friction for machine learning by ensuring that models are easily portable and reproducible.☆153Updated last week
- A benchmark to measure performance of popular Gradient boosting algorithms against popular ML datasets.☆38Updated 2 years ago
- 3rd party dependencies for DALI project☆10Updated this week
- struct2tensor is a library for parsing and manipulating structured data inside of tensorflow.☆32Updated last week
- A top-like tool for monitoring GPUs in a cluster☆80Updated 7 months ago
- Experiments API for Experiment Tracking on Kubernetes☆27Updated last year
- This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.☆138Updated this week
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆66Updated last week
- oneCCL Bindings for Pytorch*☆83Updated last week
- NVIDIA's launch, startup, and logging scripts used by our MLPerf Training and HPC submissions☆23Updated last month
- Scheduling GPU cluster workloads with Slurm☆73Updated 5 years ago
- Run cloud native workloads on NVIDIA GPUs☆124Updated 2 weeks ago
- Inference Model Manager for Kubernetes☆46Updated 5 years ago
- RAPIDS GPU-BDB☆105Updated 6 months ago
- Machine Learning Inference Graph Spec☆21Updated 5 years ago
- WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.☆18Updated 2 years ago
- FIL backend for the Triton Inference Server☆68Updated this week
- Scoreboard for ONNX Backend Compatibility☆24Updated this week
- This repository contains the results and code for the MLPerf™ Training v0.7 benchmark.☆56Updated last year