IBM / Bridge-OperatorLinks
Bridge operator repo
☆21Updated 2 months ago
Alternatives and similar repositories for Bridge-Operator
Users that are interested in Bridge-Operator are comparing it to the libraries listed below
Sorting:
- llm-d benchmark scripts and tooling☆33Updated this week
- A tool to detect infrastructure issues on cloud native AI systems☆52Updated 2 months ago
- A toolkit for discovering cluster network topology.☆84Updated this week
- Health checks for Azure N- and H-series VMs.☆55Updated 2 weeks ago
- ☆69Updated last week
- Cray-HPE System Management Documentation for Shasta, High-Performance-Computing-as-a-Service (HPCaaS).☆31Updated this week
- ☆268Updated this week
- Home of the HPC Compatible Kubernetes Integration for IBM Spectrum LSF☆43Updated 4 years ago
- knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.☆71Updated 4 months ago
- MIG Partition Editor for NVIDIA GPUs☆228Updated last week
- Cloud Native Benchmarking of Foundation Models☆44Updated 3 months ago
- Run Slurm as a Kubernetes scheduler. A Slinky project.☆50Updated last week
- Run Slurm on Kubernetes. A Slinky project.☆193Updated this week
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆140Updated this week
- NVIDIA NCCL Tests for Distributed Training☆124Updated 2 weeks ago
- Project to manage Flux tasks needed to standardize kubernetes HPC scheduling interfaces☆26Updated 11 months ago
- A Slurm cluster for Kubernetes☆66Updated last year
- ☆313Updated this week
- Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)☆497Updated this week
- Holistic job manager on Kubernetes☆115Updated last year
- NVIDIA Network Operator☆301Updated this week
- 🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads.☆33Updated last week
- A distributed system for Agentic AI☆32Updated this week
- Prometheus exporter for a Infiniband Fabric☆68Updated last year
- Singularity implementation of k8s operator for interacting with SLURM.☆117Updated 4 years ago
- Prometheus collector and exporter for Slurm cluster metrics. A Slinky project.☆14Updated 3 weeks ago
- GPU plugin to the node feature discovery for Kubernetes☆308Updated last year
- Run cloud native workloads on NVIDIA GPUs☆208Updated last month
- llm-d helm charts and deployment examples☆46Updated last week
- Testing if I can implement slurm in an operator☆15Updated last year