sgl-project / omeLinks

OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)

☆312

Alternatives and similar repositories for ome

Users that are interested in ome are comparing it to the libraries listed below

Sorting:

ovg-project / kvcached
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
☆661Updated last week
ai-dynamo / nixl
NVIDIA Inference Xfer Library (NIXL)
☆721Updated this week
sgl-project / genai-bench
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆230Updated this week
leptonai / gpud
GPUd automates monitoring, diagnostics, and issue identification for GPUs
☆454Updated this week
sgl-project / rbg
A workload for deploying LLM inference services on Kubernetes
☆105Updated last week
coreweave / nccl-tests
NVIDIA NCCL Tests for Distributed Training
☆123Updated last week
AlibabaPAI / llumnix
Efficient and easy multi-instance LLM serving
☆510Updated 2 months ago
ai-dynamo / aiconfigurator
Offline optimization of your disaggregated Dynamo graph
☆106Updated this week
bytedance / InfiniStore
KV cache store for distributed LLM inference
☆361Updated last week
llm-d / llm-d-kv-cache-manager
Distributed KV cache coordinator
☆87Updated this week
ai-dynamo / grove
Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling
☆110Updated this week
InftyAI / llmaz
☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
☆267Updated last week
llm-d / llm-d-inference-sim
A light weight vLLM simulator, for mocking out replicas.
☆58Updated this week
NVIDIA / cuda-checkpoint
CUDA checkpoint and restore utility
☆384Updated 2 months ago
LMCache / lmcache-vllm
The driver for LMCache core to run in vLLM
☆58Updated 9 months ago
kubernetes-sigs / lws
LeaderWorkerSet: An API for deploying a group of pods as a unit of replication
☆614Updated this week
perplexityai / pplx-kernels
Perplexity GPU Kernels
☆529Updated 2 weeks ago
sgl-project / sgl-learning-materials
Materials for learning SGLang
☆644Updated last week
imbue-ai / cluster-health
☆316Updated last year
llm-d / llm-d-inference-scheduler
Inference scheduler for llm-d
☆105Updated this week
meta-pytorch / torchcomms
torchcomms: a modern PyTorch communications API
☆278Updated last week
fmperf-project / fmperf
Cloud Native Benchmarking of Foundation Models
☆44Updated 3 months ago
antgroup / glake
GLake: optimizing GPU memory management and IO transmission.
☆489Updated 7 months ago
microsoft / sarathi-serve
A low-latency & high-throughput serving engine for LLMs
☆445Updated last month
intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆131Updated last month
NVIDIA / topograph
A toolkit for discovering cluster network topology.
☆83Updated this week
kubernetes-sigs / gateway-api-inference-extension
Gateway API Inference Extension
☆524Updated last week
BaizeAI / kcover
🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads.
☆33Updated this week
NVIDIA / nvidia-resiliency-ext
NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …
☆234Updated this week
perplexityai / pplx-garden
Perplexity open source garden for inference technology
☆182Updated 2 weeks ago