Self-host LLMs with vLLM and BentoML
☆169Mar 3, 2026Updated 4 months ago
Alternatives and similar repositories for BentoVLLM
Users that are interested in BentoVLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Simple dependency injection framework for Python☆21May 15, 2024Updated 2 years ago
- ☆56Nov 18, 2024Updated last year
- ☆11Apr 25, 2021Updated 5 years ago
- Cluster doctor skills☆14May 23, 2026Updated last month
- The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!☆8,697Jun 22, 2026Updated last week
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- 🚀🚀🚀 [ICML 2026] Official Implementation of Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?☆283Jun 12, 2026Updated 3 weeks ago
- API serving for your diffusers models☆11Jan 19, 2024Updated 2 years ago
- Serving CrewAI Agent as REST API with BentoML, optionally with self-host open-source LLMs☆22May 8, 2026Updated last month
- ☆12Oct 25, 2023Updated 2 years ago
- This is "Your Private StackOverflow" app that helps you perform generative search in your code bases. This is built using open-source sta…☆11Aug 14, 2023Updated 2 years ago
- Chatbot-to-speech using Orpheus TTS model. Interactive console app.☆21May 1, 2025Updated last year
- how to build a sentence embedding application using BentoML☆15Jun 10, 2026Updated 3 weeks ago
- AutoML 2024: HPOD: Hyperparameter Optimization for Unsupervised Outlier Detection☆13Jul 12, 2024Updated last year
- LLMPerf is a library for validating and benchmarking LLMs☆1,123Dec 9, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A collection of scripts and tools for analyzing SWE agents.☆16May 7, 2025Updated last year
- ☆16Feb 21, 2026Updated 4 months ago
- ☆12Jul 5, 2023Updated 2 years ago
- Pycon KR 2023 presentation☆13Feb 7, 2024Updated 2 years ago
- ☆12May 20, 2022Updated 4 years ago
- The search for the best Conversational AI pipeline☆14May 11, 2020Updated 6 years ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆1,307Jun 27, 2026Updated last week
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆29Dec 17, 2024Updated last year
- ☆346Jun 9, 2026Updated 3 weeks ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆23May 23, 2025Updated last year
- Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.☆12,381Updated this week
- Fine-tune an LLM to perform batch inference and online serving.☆121May 29, 2025Updated last year
- ☆20Jun 27, 2026Updated last week
- The Runpod worker template for serving our large language model endpoints. Powered by vLLM.☆456Jun 26, 2026Updated last week
- Test Orchestrator for Performance and Scalability of AI pLatforms☆18Jun 23, 2026Updated last week
- VS Code extension for create remote workstations (sessions) using ClearML.☆16Mar 14, 2024Updated 2 years ago
- A high performance batching router optimises max throughput for text inference workload☆16Sep 6, 2023Updated 2 years ago
- Advanced vm/sandbox for Node.js☆18Jun 11, 2026Updated 3 weeks ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Reward Model을 이용하여 언어모델의 답변을 평가하기☆30Feb 23, 2024Updated 2 years ago
- SOC Analyst Level 1 Replacement using RAG LLM☆28Aug 16, 2024Updated last year
- 🐳 Build OCI images for Bentos in k8s☆19Mar 24, 2026Updated 3 months ago
- JAX bindings for the flash-attention3 kernels☆24Jan 2, 2026Updated 6 months ago
- ☆13Jun 5, 2024Updated 2 years ago
- ☆48Nov 8, 2023Updated 2 years ago
- vLLM adapter for a TGIS-compatible gRPC server.☆55Updated this week