sgl-project / omeLinks
OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)
☆312Updated this week
Alternatives and similar repositories for ome
Users that are interested in ome are comparing it to the libraries listed below
Sorting:
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆661Updated last week
- NVIDIA Inference Xfer Library (NIXL)☆721Updated this week
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆230Updated this week
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆454Updated this week
- A workload for deploying LLM inference services on Kubernetes☆105Updated last week
- NVIDIA NCCL Tests for Distributed Training☆123Updated last week
- Efficient and easy multi-instance LLM serving☆510Updated 2 months ago
- Offline optimization of your disaggregated Dynamo graph☆106Updated this week
- KV cache store for distributed LLM inference☆361Updated last week
- Distributed KV cache coordinator☆87Updated this week
- Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling☆110Updated this week
- ☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!☆267Updated last week
- A light weight vLLM simulator, for mocking out replicas.☆58Updated this week
- CUDA checkpoint and restore utility☆384Updated 2 months ago
- The driver for LMCache core to run in vLLM☆58Updated 9 months ago
- LeaderWorkerSet: An API for deploying a group of pods as a unit of replication☆614Updated this week
- Perplexity GPU Kernels☆529Updated 2 weeks ago
- Materials for learning SGLang☆644Updated last week
- ☆316Updated last year
- Inference scheduler for llm-d☆105Updated this week
- torchcomms: a modern PyTorch communications API☆278Updated last week
- Cloud Native Benchmarking of Foundation Models☆44Updated 3 months ago
- GLake: optimizing GPU memory management and IO transmission.☆489Updated 7 months ago
- A low-latency & high-throughput serving engine for LLMs☆445Updated last month
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆131Updated last month
- A toolkit for discovering cluster network topology.☆83Updated this week
- Gateway API Inference Extension☆524Updated last week
- 🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads.☆33Updated this week
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆234Updated this week
- Perplexity open source garden for inference technology☆182Updated 2 weeks ago