IBM / Bridge-OperatorLinks
Bridge operator repo
☆21Updated 3 months ago
Alternatives and similar repositories for Bridge-Operator
Users that are interested in Bridge-Operator are comparing it to the libraries listed below
Sorting:
- Cloud Native Benchmarking of Foundation Models☆39Updated last week
- A tool to detect infrastructure issues on cloud native AI systems☆44Updated 2 weeks ago
- llm-d benchmark scripts and tooling☆21Updated this week
- ☆64Updated last week
- ☆256Updated this week
- NVIDIA NCCL Tests for Distributed Training☆102Updated 2 weeks ago
- A Slurm cluster for Kubernetes☆62Updated last year
- Holistic job manager on Kubernetes☆117Updated last year
- MIG Partition Editor for NVIDIA GPUs☆207Updated this week
- A toolkit for discovering cluster network topology.☆61Updated this week
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆120Updated this week
- Home of the HPC Compatible Kubernetes Integration for IBM Spectrum LSF☆42Updated 4 years ago
- ☆47Updated last week
- ☆283Updated this week
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆405Updated last week
- Health checks for Azure N- and H-series VMs.☆48Updated this week
- NVIDIA Network Operator☆268Updated this week
- Run Slurm on Kubernetes. A Slinky project.☆140Updated this week
- knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.☆69Updated 3 weeks ago
- Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)☆487Updated 2 weeks ago
- Testing if I can implement slurm in an operator☆15Updated 9 months ago
- Systematic and comprehensive benchmarks for LLM systems.☆24Updated last month
- Cray-HPE System Management Documentation for Shasta, High-Performance-Computing-as-a-Service (HPCaaS).☆29Updated this week
- Distributed KV cache coordinator☆46Updated this week
- Kubernetes Rdma SRIOV device plugin☆111Updated 4 years ago
- ☆119Updated 2 years ago
- A distributed engine for elastic workload☆27Updated this week
- GPU plugin to the node feature discovery for Kubernetes☆302Updated last year
- Run cloud native workloads on NVIDIA GPUs☆188Updated last week
- Go Abstraction for Allocating NVIDIA GPUs with Custom Policies☆116Updated this week