GPU Cluster Monitoring (GCM): Large-Scale AI Research Cluster Monitoring
☆222May 1, 2026Updated last week
Alternatives and similar repositories for gcm
Users that are interested in gcm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Prometheus collector and exporter for Slurm cluster metrics. A Slinky project.☆16Nov 7, 2025Updated 6 months ago
- ☆29May 2, 2026Updated last week
- Deterministic security layer for Openclaw(Clawdbot), Cursor and Claude Code. Write secure code, prevent data exfil, and more☆44Feb 5, 2026Updated 3 months ago
- Add GPU support to your Singularity container!☆15Jul 20, 2017Updated 8 years ago
- ☆11Oct 30, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆16Dec 22, 2017Updated 8 years ago
- Nền tảng server nhỏ nhắn, dễ thương :*☆15Jul 31, 2015Updated 10 years ago
- Random (silly) name generator for Golang☆20Oct 16, 2019Updated 6 years ago
- PrintQueue: Performance Diagnosis via Queue Measurement in the Data Plane☆19Jun 23, 2023Updated 2 years ago
- Kubernetes DRA Network Driver☆11Dec 12, 2024Updated last year
- The agentic server framework.☆81Apr 22, 2026Updated 2 weeks ago
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆288Updated this week
- GluonNLP tutorial for Pycon2019☆14Aug 16, 2019Updated 6 years ago
- A simple tool for parsing the profile.json file of mxnet☆14Aug 1, 2018Updated 7 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Mantis: Reactive Programmable Switches (SIGCOMM 2020)☆22Aug 21, 2020Updated 5 years ago
- Deploying EFA in EKS utilizing GPUDirectRDMA where supported☆36Oct 15, 2024Updated last year
- Static Sites with Ruby on Heroku/Cedar☆33Sep 1, 2014Updated 11 years ago
- Optimized primitives for collective multi-GPU communication☆25Apr 17, 2024Updated 2 years ago
- ☆11Aug 24, 2023Updated 2 years ago
- NNVM for ROCm Examples☆19Nov 22, 2017Updated 8 years ago
- Examples of building probabilistic models with MXNet linear algebra operators☆23Oct 24, 2017Updated 8 years ago
- Expose telemetry data in a graphical way to analyze it in real-time with Grafana dashboards☆21Mar 13, 2021Updated 5 years ago
- Deploy your own private OpenAI-compatible LLM☆27Jun 5, 2025Updated 11 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Declarative and reactive terminal UI library for Rust☆14Mar 7, 2023Updated 3 years ago
- Official code for kTrans: Knowledge-Aware Transformer for Binary Code Embedding☆30Dec 17, 2023Updated 2 years ago
- A simple, generic, and flexible keyframe animation library for Rust.☆30Mar 27, 2026Updated last month
- ☆16Nov 6, 2019Updated 6 years ago
- MXNet implementation of Graph Convolutional Neural Networks☆20Oct 8, 2018Updated 7 years ago
- 🏠 Homebridge plugin for SmartRent installations☆19Aug 12, 2022Updated 3 years ago
- Security layer for AI coding agents. Works with Claude Code, Cursor, Windsurf, Gemini CLI, OpenCode, Pi Agent and more.☆118Updated this week
- In this repository you will find code for our capstone project "How to Predict Stock Movements Using NLP Techniques". The code has been a…☆28Feb 16, 2021Updated 5 years ago
- Deep learning study in Gluon 2nd edition☆24Mar 6, 2019Updated 7 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Official Implementation of Knowledge Flow Prompting☆35Oct 20, 2025Updated 6 months ago
- how to build a sentence embedding application using BentoML☆15Mar 31, 2025Updated last year
- A Python DSL to write Nvidia PTX for Hopper and Blackwell in JAX and PyTorch☆265Apr 30, 2026Updated last week
- ☆24Dec 18, 2020Updated 5 years ago
- Intermediate MPI lesson☆28Apr 29, 2023Updated 3 years ago
- Dataflow is a data processing library, primarily for machine learning.☆24Jun 6, 2023Updated 2 years ago
- Training framework with a goal to explore the frontier of sample efficiency of small language models☆100Jan 25, 2026Updated 3 months ago