sgl-project / omeLinks
Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton
☆351Updated this week
Alternatives and similar repositories for ome
Users that are interested in ome are comparing it to the libraries listed below
Sorting:
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆730Updated last month
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆250Updated 3 weeks ago
- Offline optimization of your disaggregated Dynamo graph☆136Updated last week
- NVIDIA Inference Xfer Library (NIXL)☆783Updated this week
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆468Updated this week
- A workload for deploying LLM inference services on Kubernetes☆148Updated last week
- NVIDIA NCCL Tests for Distributed Training☆130Updated last week
- KV cache store for distributed LLM inference☆378Updated last month
- Efficient and easy multi-instance LLM serving☆519Updated 3 months ago
- ☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!☆282Updated 2 weeks ago
- A light weight vLLM simulator, for mocking out replicas.☆68Updated last week
- Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling☆133Updated last week
- The driver for LMCache core to run in vLLM☆59Updated 10 months ago
- Distributed KV cache coordinator☆92Updated last week
- CUDA checkpoint and restore utility☆398Updated 3 months ago
- LeaderWorkerSet: An API for deploying a group of pods as a unit of replication☆642Updated last week
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆357Updated this week
- A high-performance and light-weight router for vLLM large scale deployment☆65Updated this week
- Inference scheduler for llm-d☆111Updated this week
- ☆318Updated last year
- A toolkit for discovering cluster network topology.☆88Updated 2 weeks ago
- AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solu…☆78Updated last week
- A high-performance RL training-inference weight synchronization framework, designed to enable second-level parameter updates from trainin…☆118Updated last week
- Materials for learning SGLang☆703Updated 2 weeks ago
- Cloud Native Benchmarking of Foundation Models☆44Updated 5 months ago
- ☆273Updated 2 weeks ago
- Perplexity GPU Kernels☆547Updated last month
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆131Updated 3 months ago
- GenAI inference performance benchmarking tool☆137Updated last week
- Gateway API Inference Extension☆550Updated last week