IBM / Bridge-OperatorLinks
Bridge operator repo
☆21Updated last week
Alternatives and similar repositories for Bridge-Operator
Users that are interested in Bridge-Operator are comparing it to the libraries listed below
Sorting:
- llm-d benchmark scripts and tooling☆28Updated this week
- Cloud Native Benchmarking of Foundation Models☆42Updated last month
- A tool to detect infrastructure issues on cloud native AI systems☆47Updated last week
- ☆65Updated last week
- Health checks for Azure N- and H-series VMs.☆51Updated last month
- NVIDIA NCCL Tests for Distributed Training☆111Updated this week
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆129Updated this week
- Holistic job manager on Kubernetes☆116Updated last year
- knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.☆69Updated 2 months ago
- Home of the HPC Compatible Kubernetes Integration for IBM Spectrum LSF☆43Updated 4 years ago
- A Slurm cluster for Kubernetes☆63Updated last year
- Distributed KV cache coordinator☆71Updated this week
- A toolkit for discovering cluster network topology.☆69Updated this week
- MIG Partition Editor for NVIDIA GPUs☆213Updated 2 weeks ago
- 🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads.☆33Updated this week
- A workload for deploying LLM inference services on Kubernetes☆43Updated this week
- NVIDIA DRA Driver for GPUs☆446Updated last week
- Integrations between commercial and open source applications and LSF published by IBM and others.☆16Updated last year
- Project to manage Flux tasks needed to standardize kubernetes HPC scheduling interfaces☆26Updated 9 months ago
- Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)☆496Updated this week
- ☆263Updated 3 weeks ago
- A simulator of Kuberntes for batch and service workload.☆49Updated 4 years ago
- Cray-HPE System Management Documentation for Shasta, High-Performance-Computing-as-a-Service (HPCaaS).☆29Updated this week
- GenAI inference performance benchmarking tool☆97Updated this week
- Testing if I can implement slurm in an operator☆15Updated 10 months ago
- Create and deploy virtual-experiments - co-processing computational workflows☆10Updated 2 months ago
- elastic-gpu-scheduler is a Kubernetes scheduler extender for GPU resources scheduling.☆145Updated 2 years ago
- ☆299Updated last week
- llm-d helm charts and deployment examples☆42Updated this week
- The IBM Spectrum Scale Container Storage Interface (CSI) project enables container orchestrators, such as Kubernetes and OpenShift, to ma…☆84Updated this week