sgl-project / omeLinks
OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)
☆324Updated this week
Alternatives and similar repositories for ome
Users that are interested in ome are comparing it to the libraries listed below
Sorting:
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆701Updated last week
- A workload for deploying LLM inference services on Kubernetes☆123Updated last week
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆461Updated this week
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆236Updated 2 weeks ago
- NVIDIA Inference Xfer Library (NIXL)☆753Updated this week
- Offline optimization of your disaggregated Dynamo graph☆121Updated this week
- NVIDIA NCCL Tests for Distributed Training☆129Updated 3 weeks ago
- Efficient and easy multi-instance LLM serving☆515Updated 3 months ago
- ☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!☆273Updated 2 weeks ago
- Distributed KV cache coordinator☆91Updated last week
- KV cache store for distributed LLM inference☆371Updated 3 weeks ago
- CUDA checkpoint and restore utility☆395Updated 2 months ago
- A light weight vLLM simulator, for mocking out replicas.☆59Updated last week
- LeaderWorkerSet: An API for deploying a group of pods as a unit of replication☆624Updated this week
- Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling☆119Updated last week
- The driver for LMCache core to run in vLLM☆59Updated 10 months ago
- ☆317Updated last year
- Kubernetes-native AI serving platform for scalable model serving.☆92Updated this week
- Inference scheduler for llm-d☆110Updated this week
- ☆268Updated 2 weeks ago
- Cloud Native Benchmarking of Foundation Models☆44Updated 4 months ago
- Gateway API Inference Extension☆537Updated last week
- Materials for learning SGLang☆667Updated last week
- Perplexity GPU Kernels☆536Updated last month
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆130Updated 2 months ago
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆345Updated this week
- torchcomms: a modern PyTorch communications API☆298Updated last week
- 🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads.☆33Updated last week
- A high-performance RL training-inference weight synchronization framework, designed to enable second-level parameter updates from trainin…☆108Updated this week
- Serverless LLM Serving for Everyone.☆622Updated this week