IBM / Bridge-OperatorLinks
Bridge operator repo
☆21Updated last month
Alternatives and similar repositories for Bridge-Operator
Users that are interested in Bridge-Operator are comparing it to the libraries listed below
Sorting:
- A Slurm cluster for Kubernetes☆60Updated 11 months ago
- Holistic job manager on Kubernetes☆116Updated last year
- A tool to detect infrastructure issues on cloud native AI systems☆41Updated last month
- ☆62Updated last week
- knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.☆67Updated last month
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆114Updated this week
- MIG Partition Editor for NVIDIA GPUs☆201Updated last week
- NVIDIA NCCL Tests for Distributed Training☆97Updated last week
- Cloud Native Benchmarking of Foundation Models☆38Updated 2 weeks ago
- Systematic and comprehensive benchmarks for LLM systems.☆17Updated last week
- This repo includes everything you need to know about deploying GPU nodes on OCI☆32Updated last week
- Home of the HPC Compatible Kubernetes Integration for IBM Spectrum LSF☆42Updated 4 years ago
- GenAI inference performance benchmarking tool☆61Updated this week
- ☆273Updated this week
- Singularity implementation of k8s operator for interacting with SLURM.☆117Updated 4 years ago
- 🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads.☆30Updated 6 months ago
- A toolkit for discovering cluster network topology.☆54Updated this week
- RDMA CNI plugin for containerized workloads☆53Updated this week
- Device plugins for Volcano, e.g. GPU☆124Updated 3 months ago
- Kubernetes Rdma SRIOV device plugin☆111Updated 4 years ago
- llm-d benchmark scripts and tooling☆17Updated this week
- Resource Exporter for volcano scheduling, e.g. NUMA-Aware scheduling.☆17Updated 3 weeks ago
- Distributed KV cache coordinator☆36Updated this week
- Health checks for Azure N- and H-series VMs.☆44Updated last week
- Inference scheduler for llm-d☆57Updated this week
- Run Slurm on Kubernetes. A Slinky project.☆121Updated this week
- NVIDIA Network Operator☆257Updated this week
- OCI-compatible engine to deploy Linux containers on HPC environments.☆138Updated 7 months ago
- Project to manage Flux tasks needed to standardize kubernetes HPC scheduling interfaces☆26Updated 6 months ago
- Go Abstraction for Allocating NVIDIA GPUs with Custom Policies☆113Updated last week