sgl-project/rbg

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sgl-project/rbg)

sgl-project / rbg

A workload for deploying LLM inference services on Kubernetes

☆263

Alternatives and similar repositories for rbg

Users that are interested in rbg are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kubernetes-sigs / lws
View on GitHub
LeaderWorkerSet: An API for deploying a group of pods as a unit of replication
☆769Updated this week
ome-projects / ome
View on GitHub
Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, T…
☆481Updated this week
volcano-sh / kthena
View on GitHub
Kubernetes-native AI serving platform for scalable model serving.
☆396Updated this week
ai-dynamo / modelexpress
View on GitHub
Model Express is a Rust-based component meant to be placed next to existing model inference systems to speed up their startup times and i…
☆86Updated this week
ai-dynamo / grove
View on GitHub
Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling
☆243Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
volcano-sh / agentcube
View on GitHub
☆157Updated this week
openkruise / agents
View on GitHub
Rapid and cost-effective operator and best practice for agent sandbox lifecycle management.
☆243Updated this week
llm-d-incubation / llm-d-fast-model-actuation
View on GitHub
Kubernetes controllers for fast model actuation using vLLM sleep/wake and launcher-based model swapping
☆16Updated this week
kvcache-ai / Mooncake
View on GitHub
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
☆5,968Updated this week
novitalabs / pegaflow
View on GitHub
High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and S…
☆183Updated this week
koordinator-sh / koordinator
View on GitHub
A QoS-based scheduling system brings optimal layout and status to workloads such as microservices, web services, big data jobs, AI jobs, …
☆1,724Updated this week
kai-scheduler / KAI-Scheduler
View on GitHub
KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale
☆1,408Updated this week
ai-dynamo / nixl
View on GitHub
NVIDIA Inference Xfer Library (NIXL)
☆1,147Updated this week
kubernetes-sigs / gateway-api-inference-extension
View on GitHub
Gateway API Inference Extension
☆723Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ai-dynamo / aiconfigurator
View on GitHub
Offline optimization of your disaggregated Dynamo graph
☆372Updated this week
ai-dynamo / dynamo
View on GitHub
A Datacenter Scale Distributed Inference Serving Framework
☆7,563Updated this week
AliyunContainerService / ack-ram-authenticator
View on GitHub
Using Alibaba Cloud credentials to authenticate to a Kubernetes cluster
☆32Sep 13, 2024Updated last year
Project-HAMi / HAMi
View on GitHub
Heterogeneous GPU Sharing on Kubernetes
☆4,042Updated this week
aliyun / kvc-3fs-operator
View on GitHub
☆42Apr 16, 2026Updated 3 months ago
kubernetes-sigs / inference-perf
View on GitHub
GenAI inference performance benchmarking tool
☆212Updated this week
llm-d / llm-d
View on GitHub
Achieve state of the art inference performance with modern accelerators on Kubernetes
☆3,863Updated this week
fluid-cloudnative / fluid
View on GitHub
Fluid, elastic data abstraction and acceleration for BigData/AI applications in cloud. (Project under CNCF)
☆1,955Updated this week
sgl-project / genai-bench
View on GitHub
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆314Updated this week
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
lightseekorg / smg
View on GitHub
Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across vLLM, TRT-LLM, TokenSpeed, SGLang, OpenAI, Gemini &…
☆413Updated this week
aigw-project / aigw
View on GitHub
The Intelligent Inference Scheduler for Large-scale Inference Services.
☆68Feb 12, 2026Updated 5 months ago
AliyunContainerService / helm-acr
View on GitHub
Alibaba Cloud's Helm plugin to push chart package to ChartMuseum.
☆22Dec 3, 2021Updated 4 years ago
Project-HAMi / HAMi-core
View on GitHub
HAMi-core compiles libvgpu.so, which ensures hard limit on GPU in container
☆319Updated this week
llumnix-project / llumnix-ray
View on GitHub
Efficient and easy multi-instance LLM serving
☆563Mar 12, 2026Updated 4 months ago
kubewharf / katalyst-core
View on GitHub
Katalyst aims to provide a universal solution to help improve resource utilization and optimize the overall costs in the cloud. This is t…
☆560Updated this week
bytedance / InfiniStore
View on GitHub
KV cache store for distributed LLM inference
☆425Nov 13, 2025Updated 8 months ago
Mellanox / k8s-rdma-shared-dev-plugin
View on GitHub
☆375Updated this week
volcano-sh / volcano
View on GitHub
A Cloud Native Batch System (Project under CNCF)
☆5,803Updated this week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
knoway-dev / knoway
View on GitHub
An Envoy inspired, ultimate LLM-first gateway for LLM serving and downstream application developers and enterprises
☆27Apr 24, 2025Updated last year
BaizeAI / kcover
View on GitHub
🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads.
☆35Updated this week
antgroup / sglang
View on GitHub
SGLang is a fast serving framework for large language models and vision language models.
☆36Updated this week
copilot-io / runtime-copilot
View on GitHub
The main purpose of runtime copilot is to assist with node runtime management tasks such as configuring registries, upgrading versions, i…
☆13May 16, 2023Updated 3 years ago
kubernetes-sigs / wg-serving
View on GitHub
WG Serving
☆38Mar 24, 2026Updated 3 months ago
AliyunContainerService / et-operator
View on GitHub
Kubernetes Operator for AI and Bigdata Elastic Training
☆91Jan 10, 2025Updated last year
modelpack / model-spec
View on GitHub
An Open Standard for Packaging, Distributing and Running LLMs in Cloud-Native Environments
☆218Updated this week