NVIDIA / nim-deploy
A collection of YAML files, Helm Charts, Operator code, and guides to act as an example reference implementation for NVIDIA NIM deployment.
☆157Updated last week
Alternatives and similar repositories for nim-deploy:
Users that are interested in nim-deploy are comparing it to the libraries listed below
- Accelerate your Gen AI with NVIDIA NIM and NVIDIA AI Workbench☆145Updated 3 weeks ago
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆82Updated this week
- Infrastructure as code for GPU accelerated managed Kubernetes clusters.☆50Updated 2 months ago
- ☆157Updated this week
- Run cloud native workloads on NVIDIA GPUs☆156Updated this week
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆53Updated this week
- ☆132Updated last week
- ☆39Updated this week
- Self-host LLMs with vLLM and BentoML☆86Updated this week
- ☆223Updated this week
- LLMPerf is a library for validating and benchmarking LLMs☆738Updated 2 months ago
- Repository for open inference protocol specification☆46Updated 6 months ago
- Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.☆218Updated this week
- Containerization and cloud native suite for OPEA☆37Updated this week
- Hugging Face Deep Learning Containers (DLCs) for Google Cloud☆141Updated 3 weeks ago
- MIG Partition Editor for NVIDIA GPUs☆186Updated this week
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆286Updated 2 weeks ago
- Community-maintained Kubernetes config and Helm chart for Langfuse☆76Updated 3 weeks ago
- End-to-End LLM Guide☆101Updated 7 months ago
- BigBertha is an architecture design that demonstrates how automated LLMOps (Large Language Models Operations) can be achieved on any Kube…☆27Updated last year
- An NVIDIA AI Workbench example project for Retrieval Augmented Generation (RAG)☆302Updated 2 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆111Updated last week
- Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes☆312Updated this week
- Python SDK for Llama Stack☆128Updated this week
- ☆112Updated last week
- Controller for ModelMesh☆218Updated last month
- A Lightweight Library for AI Observability☆233Updated last week
- Helm charts for the KubeRay project☆38Updated last week
- Module, Model, and Tensor Serialization/Deserialization☆212Updated 2 months ago