IBM / Bridge-OperatorLinks
Bridge operator repo
☆21Updated 3 months ago
Alternatives and similar repositories for Bridge-Operator
Users that are interested in Bridge-Operator are comparing it to the libraries listed below
Sorting:
- llm-d benchmark scripts and tooling☆41Updated this week
- Cloud Native Benchmarking of Foundation Models☆44Updated 5 months ago
- Health checks for Azure N- and H-series VMs.☆55Updated 3 weeks ago
- A tool to detect infrastructure issues on cloud native AI systems☆52Updated 3 months ago
- knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.☆73Updated 5 months ago
- ☆70Updated this week
- Run Slurm on Kubernetes. A Slinky project.☆213Updated 2 weeks ago
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆140Updated 3 weeks ago
- A toolkit for discovering cluster network topology.☆89Updated this week
- Holistic job manager on Kubernetes☆115Updated last year
- Project to manage Flux tasks needed to standardize kubernetes HPC scheduling interfaces☆26Updated this week
- A Slurm cluster for Kubernetes☆67Updated last year
- NVIDIA NCCL Tests for Distributed Training☆132Updated this week
- MIG Partition Editor for NVIDIA GPUs☆235Updated this week
- Inference scheduler for llm-d☆117Updated this week
- Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)☆505Updated 3 weeks ago
- llm-d helm charts and deployment examples☆48Updated last month
- A workload for deploying LLM inference services on Kubernetes☆153Updated this week
- GPU plugin to the node feature discovery for Kubernetes☆308Updated last year
- ☆276Updated last month
- A federation scheduler for multi-cluster☆59Updated this week
- This repo includes everything you need to know about deploying GPU nodes on OCI☆42Updated this week
- GenAI inference performance benchmarking tool☆140Updated 3 weeks ago
- 🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads.☆34Updated 3 weeks ago
- Run Slurm as a Kubernetes scheduler. A Slinky project.☆56Updated 3 weeks ago
- NVIDIA Network Operator☆315Updated last week
- JobSet: a k8s native API for distributed ML training and HPC workloads☆297Updated this week
- ☆328Updated 2 weeks ago
- Home of the HPC Compatible Kubernetes Integration for IBM Spectrum LSF☆44Updated 4 years ago
- Prometheus exporter for a Infiniband Fabric☆68Updated 2 years ago