InfiniBand fabric monitoring daemon written in Go
☆32May 22, 2025Updated 10 months ago
Alternatives and similar repositories for fabricmon
Users that are interested in fabricmon are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Converts an Infiniband topology file to graphviz dot format or slurm topology.conf format☆17Feb 2, 2026Updated 2 months ago
- Kerberos credential support for batch environments☆16Jul 24, 2024Updated last year
- A terminal based monitoring tool for InfiniBand networks using Detector (https://github.com/hhu-bsinfo/detector)☆15Aug 7, 2019Updated 6 years ago
- Monitoring and visualization of InfiniBand Fabrics☆23Apr 19, 2021Updated 4 years ago
- ☆74Oct 25, 2025Updated 5 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Slurm job script archival☆12Apr 6, 2026Updated last week
- Generate graphviz dot files from InfiniBand topology dumps.☆16Feb 11, 2024Updated 2 years ago
- DGXC Benchmarking provides recipes in ready-to-use templates for evaluating performance of specific AI use cases across hardware and soft…☆79Apr 7, 2026Updated last week
- Pavilion is a Python 3 (3.6+) based framework for running and analyzing tests targeting HPC systems.☆46Updated this week
- A pure-Go library for Linux device mapper target management☆22Mar 15, 2026Updated last month
- NVIDIA NCCL Tests for Distributed Training☆142Apr 6, 2026Updated last week
- Command openvswitch_exporter implements a Prometheus exporter for Open vSwitch.☆38Nov 3, 2025Updated 5 months ago
- nvloom is a set of tools designed to scalably test MNNVL fabrics.☆44Apr 1, 2026Updated 2 weeks ago
- Ansible playbooks used to deploy/configure LIO gateways as a front end to a ceph cluster☆13Sep 21, 2017Updated 8 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Command-line tool to retrieve information and monitor Mellanox un-managed Infiniband switches☆74Nov 17, 2025Updated 4 months ago
- Linux Sysinfo Snapshot☆66Feb 22, 2026Updated last month
- Scripts for monitoring InfiniBand and storage devices☆11Sep 4, 2015Updated 10 years ago
- RPerf: Accurate Latency Measurement Framework for RDMA☆15Sep 24, 2025Updated 6 months ago
- Tool to profile usage of HPC resources by regularly probing processes.☆11Apr 9, 2026Updated last week
- ☆12May 30, 2025Updated 10 months ago
- Multi-GPU communication profiler and visualizer☆39Jun 10, 2024Updated last year
- Information for the Intro to Cluster System Administration for Non-Sysadmins class☆10Dec 12, 2021Updated 4 years ago
- ☆10Dec 18, 2025Updated 3 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Unit test generator for Fortran applications using Capture & Replay☆24Nov 4, 2019Updated 6 years ago
- Golang bindings for Nvidia Datacenter GPU Manager (DCGM)☆151Apr 9, 2026Updated last week
- Appendix resources for Intrinsec's "Amélioration des capacités de détection" handbook.☆13Mar 26, 2018Updated 8 years ago
- This repo includes everything you need to know about deploying GPU nodes on OCI☆48Updated this week
- A distributed in-memory key-value storage for billions of small objects.☆27Aug 23, 2019Updated 6 years ago
- pytorch code examples for measuring the performance of collective communication calls in AI workloads☆19Sep 18, 2025Updated 6 months ago
- The Singularity SPANK plugin provides the users with an interface to launch an application within a Linux container.☆12Nov 4, 2025Updated 5 months ago
- ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage☆76Mar 29, 2026Updated 2 weeks ago
- ☆12Sep 15, 2025Updated 7 months ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- A remote registry for Singularity Registry HPC 🖊️☆15Updated this week
- Pocket Survival Guide for Sys Admin - http://psg.skinforum.org/ -☆15Mar 12, 2026Updated last month
- ☆11Apr 9, 2026Updated last week
- AutoParBench is a benchmark framework to evaluate compilers and tools designed to automatically insert OpenMP directives.☆12Nov 6, 2020Updated 5 years ago
- Lustre Monitoring System based on Collectd, Grafana and Influxdb☆46Dec 12, 2023Updated 2 years ago
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆481Updated this week
- Sun::Kstat perl module for linux-zfs☆20Aug 16, 2013Updated 12 years ago