sgl-project / omeLinks
OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)
☆298Updated this week
Alternatives and similar repositories for ome
Users that are interested in ome are comparing it to the libraries listed below
Sorting:
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆469Updated this week
- A workload for deploying LLM inference services on Kubernetes☆87Updated this week
- NVIDIA Inference Xfer Library (NIXL)☆688Updated this week
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆441Updated last week
- NVIDIA NCCL Tests for Distributed Training☆116Updated last week
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆222Updated last week
- Offline optimization of your disaggregated Dynamo graph☆88Updated this week
- Efficient and easy multi-instance LLM serving☆502Updated last month
- KV cache store for distributed LLM inference☆346Updated last month
- CUDA checkpoint and restore utility☆377Updated last month
- A light weight vLLM simulator, for mocking out replicas.☆55Updated this week
- Cloud Native Benchmarking of Foundation Models☆44Updated 3 months ago
- Distributed KV cache coordinator☆80Updated this week
- ☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!☆261Updated 2 weeks ago
- ☆316Updated last year
- A toolkit for discovering cluster network topology.☆74Updated this week
- The driver for LMCache core to run in vLLM☆55Updated 8 months ago
- Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling☆82Updated this week
- Inference scheduler for llm-d☆99Updated this week
- LeaderWorkerSet: An API for deploying a group of pods as a unit of replication☆604Updated this week
- Perplexity GPU Kernels☆513Updated this week
- Materials for learning SGLang☆618Updated 3 weeks ago
- 🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads.☆33Updated 2 weeks ago
- Gateway API Inference Extension☆506Updated this week
- ☆258Updated last week
- GLake: optimizing GPU memory management and IO transmission.☆486Updated 7 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆132Updated last month
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆228Updated last week
- knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.☆70Updated 3 months ago
- Common recipes to run vLLM☆196Updated this week