scitix/sichek

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/scitix/sichek)

scitix / sichek

Sichek is a tool for detecting and diagnosing node-level issues in AI environments, ensuring the reliability and high performance of GPU-intensive workloads. It proactively identifies hardware and software problems, and triggers automated corrective actions, including task retries and operational maintenance timely

☆27

Alternatives and similar repositories for sichek

Users that are interested in sichek are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

scitix / SiMM
View on GitHub
SiMM: Scalable in-Memory Middleware
☆41Apr 20, 2026Updated 3 months ago
Zhaojp-Frank / AwesomePaper-for-AI
View on GitHub
Awesome system papers for AI
☆21Updated this week
scitix / arks
View on GitHub
Arks is a cloud-native inference framework running on Kubernetes
☆51May 14, 2026Updated 2 months ago
treydock / infiniband_exporter
View on GitHub
☆78Oct 25, 2025Updated 8 months ago
scitix / netpulse
View on GitHub
API Server for Network and Linux Automation
☆97May 4, 2026Updated 2 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
idiv-biodiversity / mmdu
View on GitHub
disk usage for IBM Storage Scale file systems
☆12Jul 14, 2026Updated last week
stanford-rc / ibswinfo
View on GitHub
Command-line tool to retrieve information and monitor Mellanox un-managed Infiniband switches
☆77Nov 17, 2025Updated 8 months ago
IBM / ibm-spectrum-scale-bridge-for-grafana
View on GitHub
This tool allows IBM Storage Scale users to perform performance monitoring for IBM Storage Scale devices using third-party applications s…
☆45Jul 1, 2026Updated 2 weeks ago
babodx / zabbix_import_hosts
View on GitHub
zabbix批量导入监控主机
☆10Feb 2, 2015Updated 11 years ago
ebeahan / aeon-ztps
View on GitHub
Multi-Vendor ZTP Server
☆18Apr 1, 2021Updated 5 years ago
dravetech / network-validation-napalm
View on GitHub
Validating Network Deployments with NAPALM
☆11Feb 1, 2018Updated 8 years ago
scitix / InstantTensor
View on GitHub
An ultra-fast, distributed Safetensors loader
☆67Updated this week
hhu-bsinfo / ib-scanner
View on GitHub
A terminal based monitoring tool for InfiniBand networks using Detector (https://github.com/hhu-bsinfo/detector)
☆15Aug 7, 2019Updated 6 years ago
hpreston / demo_mac_to_interface_tool
View on GitHub
This is an example script that leverages pyATS to lookup the switch interfaces where MAC Addresses are located.
☆12Jan 4, 2021Updated 5 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
hoelsner / networkconfgen
View on GitHub
Jinja2 based configuration generator with some extensions required to generate configurations for network devices. It's build on top of …
☆19Oct 11, 2017Updated 8 years ago
jabl / ibtopotool
View on GitHub
Converts an Infiniband topology file to graphviz dot format or slurm topology.conf format
☆18Feb 2, 2026Updated 5 months ago
laszlocph / tsdbinfo
View on GitHub
Understand the series and labels you store in Prometheus
☆23Aug 28, 2019Updated 6 years ago
leptonai / gpud
View on GitHub
GPUd automates monitoring, diagnostics, and issue identification for GPUs
☆486Updated this week
FarisZR / caddy-dns-OCI
View on GitHub
automated OCI (docker image) builds for (almost) all caddy dns plugins!
☆13Jun 15, 2026Updated last month
StackStorm-Exchange / stackstorm-netbox
View on GitHub
☆14Jun 23, 2025Updated last year
ShiinaOrez / Tutor-Go
View on GitHub
这是不写代码的屁股的Go语言教程。
☆10Nov 21, 2020Updated 5 years ago
Mellanox / mlnx-tools
View on GitHub
Mellanox userland tools and scripts
☆147Updated this week
tbotnz / cmdboss
View on GitHub
API driven, integrated configuration management
☆25May 8, 2021Updated 5 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
infiniband-radar / infiniband-radar-daemon
View on GitHub
☆15Nov 25, 2021Updated 4 years ago
vasya4k / gojun
View on GitHub
Simple NETCONF API example
☆16Dec 10, 2017Updated 8 years ago
jeremyschulman / netbox-pyswagger
View on GitHub
Python Swagger client for Netbox
☆18Mar 13, 2019Updated 7 years ago
mumoshu / node-detacher
View on GitHub
Practically and gracefully stop your K8s node on (termination|scale down|maintenance)
☆13Jul 15, 2020Updated 6 years ago
rafayopen / pingmesh
View on GitHub
Pingmesh measures and reports network performance and availability of a set of communicating peers
☆48Nov 6, 2019Updated 6 years ago
hongfeioo / H3C_netconf_lib
View on GitHub
通过netconf协议操作h3c交换机，可实现增删静态路由条目等功能
☆13May 26, 2017Updated 9 years ago
inclusionAI / AState
View on GitHub
☆41Dec 9, 2025Updated 7 months ago
rackslab / RacksDB
View on GitHub
YAML-based database of datacenter infrastructures
☆31Jun 4, 2026Updated last month
CLIP-HPC / goslmailer
View on GitHub
GoSlurmMailer - drop in replacement for default slurm MailProg. Delivers slurm job messages to various destinations.
☆48Oct 1, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
inclusionAI / Awex
View on GitHub
A high-performance RL training-inference weight synchronization framework, designed to enable second-level parameter updates from trainin…
☆165May 25, 2026Updated last month
dravetech / preso_abstract_all_the_things
View on GitHub
Abstract all the things
☆14Jun 18, 2016Updated 10 years ago
networkop / network-as-a-service
View on GitHub
Network-as-a-Service Proof-of-Concept
☆18Feb 1, 2020Updated 6 years ago
Sidarion / netbox-joined-inventory
View on GitHub
Netbox_joined_inventory is a python script that gathers data from a Netbox source-of-truth and stores them as Ansible inventory, group_va…
☆22Jul 29, 2020Updated 5 years ago
napalm-automation-community / napalm-huawei-vrp
View on GitHub
NAPALM Driver for Huawei VRP5/VRP8 Routers and Switches
☆96Apr 13, 2026Updated 3 months ago
ketgo / nameko-kafka
View on GitHub
Kafka extension for Nameko framework
☆18Jul 12, 2023Updated 3 years ago
nmilford / rpm-python27
View on GitHub
An RPM spec file build and alt-install Python 2.7 on RHEL.
☆26May 13, 2016Updated 10 years ago