NVIDIA / nim-deployLinks
A collection of YAML files, Helm Charts, Operator code, and guides to act as an example reference implementation for NVIDIA NIM deployment.
☆204Updated this week
Alternatives and similar repositories for nim-deploy
Users that are interested in nim-deploy are comparing it to the libraries listed below
Sorting:
- Accelerate your Gen AI with NVIDIA NIM and NVIDIA AI Workbench☆188Updated 6 months ago
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆134Updated last week
- Infrastructure as code for GPU accelerated managed Kubernetes clusters.☆56Updated 6 months ago
- ☆267Updated this week
- Run cloud native workloads on NVIDIA GPUs☆205Updated last month
- This NVIDIA RAG blueprint serves as a reference solution for a foundational Retrieval Augmented Generation (RAG) pipeline.☆364Updated last week
- AI on GKE is a collection of examples, best-practices, and prebuilt solutions to help build, deploy, and scale AI Platforms on Google Kub…☆324Updated 4 months ago
- Repository for open inference protocol specification☆59Updated 6 months ago
- markdown docs☆92Updated last week
- GenAI components at micro-service level; GenAI service composer to create mega-service☆184Updated last week
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆131Updated last month
- Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.☆508Updated 6 months ago
- ☆173Updated this week
- Helm charts for the KubeRay project☆56Updated last week
- Self-host LLMs with vLLM and BentoML☆156Updated 3 weeks ago
- Distributed Model Serving Framework☆178Updated last month
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆71Updated this week
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆388Updated 5 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆21Updated last week
- Community-maintained Kubernetes config and Helm chart for Langfuse☆178Updated last week
- The NVIDIA AIQToolkit UI streamlines interacting with AIQToolkit workflows in an easy-to-use web application.☆51Updated last week
- KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale☆927Updated this week
- This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.☆441Updated last year
- An NVIDIA AI Workbench example project for fine-tuning a Mistral 7B model☆63Updated last year
- Hugging Face Deep Learning Containers (DLCs) for Google Cloud☆154Updated 2 weeks ago
- A top-like tool for monitoring GPUs in a cluster☆85Updated last year
- ☆111Updated 10 months ago
- Containerization and cloud native suite for OPEA☆70Updated last month
- Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.☆247Updated last week
- OpenTelemetry Instrumentation for AI Observability☆714Updated this week