imbue-ai / cluster-healthView on GitHub
Scripts for managing a large H100 cluster and fixing hardware issues to ensure smooth model training.
323Aug 20, 2024Updated last year

Alternatives and similar repositories for cluster-health

Users that are interested in cluster-health are comparing it to the libraries listed below

Sorting:

Are these results useful?