llm-d / llm-d-deployerLinks
Helm charts for llm-d
☆32Updated last week
Alternatives and similar repositories for llm-d-deployer
Users that are interested in llm-d-deployer are comparing it to the libraries listed below
Sorting:
- Inference scheduler for llm-d☆41Updated this week
- ☆19Updated last week
- AppWrapper controller for Kueue☆13Updated this week
- GenAI inference performance benchmarking tool☆44Updated this week
- Distributed KV cache coordinator☆31Updated last week
- Repository to demo GPU Sharing with Time Slicing, MPS, MIG and others☆41Updated 7 months ago
- Simplified model deployment on llm-d☆18Updated this week
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆103Updated this week
- ☆41Updated 2 months ago
- InstaSlice facilitates the use of Dynamic Resource Allocation (DRA) on Kubernetes clusters for GPU sharing☆27Updated 6 months ago
- Artifacts for the Distributed Workloads stack as part of ODH☆31Updated this week
- WG Serving☆25Updated last month
- Repository for open inference protocol specification☆56Updated 2 weeks ago
- Smart Kubernetes Scheduling☆80Updated this week
- Cloud Native Benchmarking of Foundation Models☆34Updated 2 weeks ago
- Model Registry provides a single pane of glass for ML model developers to index and manage models, versions, and ML artifacts metadata. I…☆126Updated this week
- ☆37Updated this week
- InstaSlice Operator facilitates slicing of accelerators using stable APIs☆35Updated last week
- IBM development fork of https://github.com/huggingface/text-generation-inference☆60Updated 3 weeks ago
- Example DRA driver that developers can fork and modify to get them started writing their own.☆73Updated last week
- Model Server for Kepler☆27Updated last week
- Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.☆79Updated this week
- ☆152Updated this week
- Gateway API Inference Extension☆304Updated this week
- Distributed Model Serving Framework☆167Updated 2 weeks ago
- K8s device plugin for GPU sharing☆98Updated 2 years ago
- This repository contains resources, documentation and artifacts describing LLM agents☆14Updated 4 months ago
- ☆38Updated this week
- Slurm in Kubernetes☆42Updated 5 months ago
- ☆34Updated last week