DGXC Benchmarking provides recipes in ready-to-use templates for evaluating performance of specific AI use cases across hardware and software combinations.
☆89May 12, 2026Updated last week
Alternatives and similar repositories for dgxc-benchmarking
Users that are interested in dgxc-benchmarking are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A TUI-based utility for real-time monitoring of InfiniBand traffic and performance metrics on the local node☆67Updated this week
- nv-one-logger enables tracking of GPU application progress over time and can help to identify overhead from workload and cluster ineffici…☆23Nov 6, 2025Updated 6 months ago
- Linux Sysinfo Snapshot☆66May 14, 2026Updated last week
- ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage☆83May 12, 2026Updated last week
- InfiniBand fabric monitoring daemon written in Go☆32May 22, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A Slurm-based HPC workload management environment, driven by Ansible.☆72Updated this week
- A toolkit for discovering cluster network topology.☆130May 15, 2026Updated last week
- NVIDIA Fleet Command is a hybrid-cloud platform for securely and remotely deploying, managing, and scaling AI across dozens or up to thou…☆14Jul 20, 2022Updated 3 years ago
- A Kubernetes Operator to manage Node OS customizations.☆55Updated this week
- Tooling for optimized, validated, and reproducible GPU-accelerated AI runtime in Kubernetes☆300Updated this week
- Parallel Computing -- Validation Suite: Validation engine for Exascale project benchmarks☆16Mar 26, 2026Updated last month
- Optimized primitives for collective multi-GPU communication☆10May 8, 2024Updated 2 years ago
- Scripts to customize AWS ParallelCluster☆29Sep 5, 2025Updated 8 months ago
- A distributed storage benchmark for file systems, object stores & block devices with support for GPUs☆268May 9, 2026Updated last week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Create an Amazon EKS cluster and run a distributed training example☆29Aug 19, 2024Updated last year
- Utility for monitoring process, thread, OS and HW resources.☆20Updated this week
- This repository contains the results and code for the MLPerf™ Training v4.0 benchmark.☆12Jun 11, 2024Updated last year
- PLASMA parallel library for dense linear algebra.☆10May 30, 2017Updated 8 years ago
- An Ansible role for installing and configuring CernVM-FS (CVMFS)☆16Mar 2, 2026Updated 2 months ago
- Multi-GPU communication profiler and visualizer☆41Jun 10, 2024Updated last year
- Aries Network Performance Counters Monitoring Library☆11Nov 19, 2020Updated 5 years ago
- Empirical-Research Toolkit☆11Apr 29, 2026Updated 3 weeks ago
- Information for the Intro to Cluster System Administration for Non-Sysadmins class☆10Dec 12, 2021Updated 4 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Sample Codes using NVSHMEM on Multi-GPU☆30Jan 22, 2023Updated 3 years ago
- ☆10Dec 18, 2025Updated 5 months ago
- Pavilion is a Python 3 (3.6+) based framework for running and analyzing tests targeting HPC systems.☆46May 15, 2026Updated last week
- ☆256Updated this week
- A small C++ wrapper for managing Linux CPU sets and CPU affinity☆11Dec 11, 2025Updated 5 months ago
- pytorch code examples for measuring the performance of collective communication calls in AI workloads☆20Sep 18, 2025Updated 8 months ago
- NVIDIA Infra Controller - Hardware Lifecycle Management and multitenant networking☆149May 16, 2026Updated last week
- A complete CUDA tutorial ranging from first GPU programs to advanced asynchronous methods☆30Jan 22, 2026Updated 4 months ago
- Run Slurm on Kubernetes. A Slinky project.☆296May 15, 2026Updated last week
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- A remote registry for Singularity Registry HPC 🖊️☆15Updated this week
- Pocket Survival Guide for Sys Admin - http://psg.skinforum.org/ -☆15May 11, 2026Updated last week
- A wrapper around SageMaker ML Lineage Tracking extending ML Lineage to end-to-end ML lifecycles, including additional capabilities around…☆16Oct 14, 2021Updated 4 years ago
- Enables HPC Environment in an OpenStack Cloud☆11Jan 12, 2018Updated 8 years ago
- Bunch of helper files for the Slurm resource manager☆15Apr 21, 2026Updated last month
- A compact and extensible image viewer☆12Jun 22, 2020Updated 5 years ago
- A new memory mapping interface for efficient direct user-space access to byte-addressable storage, published in MICRO2022.☆16Sep 29, 2022Updated 3 years ago