Self-host LLMs with vLLM and BentoML
☆170Mar 3, 2026Updated 2 weeks ago
Alternatives and similar repositories for BentoVLLM
Users that are interested in BentoVLLM are comparing it to the libraries listed below
Sorting:
- Simple dependency injection framework for Python☆21May 15, 2024Updated last year
- ☆17Feb 18, 2025Updated last year
- 🚀 Launching Bento in a Kubernetes cluster☆17Mar 16, 2025Updated last year
- ☆11Apr 25, 2021Updated 4 years ago
- Benchmarking suite for popular AI APIs☆88Feb 6, 2025Updated last year
- Cluster doctor skills☆14Feb 20, 2026Updated last month
- Open sourced result for The Agent Company☆21Nov 11, 2025Updated 4 months ago
- The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!☆8,520Updated this week
- ☆12Oct 25, 2023Updated 2 years ago
- ☆21Apr 17, 2025Updated 11 months ago
- Chatbot-to-speech using Orpheus TTS model. Interactive console app.☆21May 1, 2025Updated 10 months ago
- how to build a sentence embedding application using BentoML☆14Mar 31, 2025Updated 11 months ago
- LLMPerf is a library for validating and benchmarking LLMs☆1,092Dec 9, 2024Updated last year
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆913Mar 14, 2026Updated last week
- ☆16Feb 21, 2026Updated last month
- Build Phone Calling Voice Agent fully powered by open source models.☆116Apr 18, 2025Updated 11 months ago
- Pycon KR 2023 presentation☆13Feb 7, 2024Updated 2 years ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆32Sep 19, 2025Updated 6 months ago
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆27Dec 17, 2024Updated last year
- ☆334Updated this week
- Fast model deployment on AWS Lambda☆14Feb 25, 2024Updated 2 years ago
- Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.☆12,174Updated this week
- Sample fastAPI Application to demonstrate OpenTelemetry instrumentation☆19Jul 26, 2025Updated 7 months ago
- Fine-tune an LLM to perform batch inference and online serving.☆121May 29, 2025Updated 9 months ago
- ☆20Mar 14, 2026Updated last week
- A simple node to download repos from HF specify a repo ID or File create a folder where you want to download the files then rename the fo…☆25Jul 14, 2025Updated 8 months ago
- The RunPod worker template for serving our large language model endpoints. Powered by vLLM.☆406Mar 10, 2026Updated last week
- Simple tool to change the INPUT and OUTPUT shape of ONNX.☆15Apr 1, 2025Updated 11 months ago
- ☆25Apr 9, 2025Updated 11 months ago
- Reward Model을 이용하여 언어모델의 답변을 평가하기☆29Feb 23, 2024Updated 2 years ago
- ☆12Jan 24, 2025Updated last year
- Aspect based sentiment analysis aims to detect an aspect (i.e. features) in a given text and then perform sentiment analysis of the text …☆10Nov 15, 2021Updated 4 years ago
- 🐳 Build OCI images for Bentos in k8s☆20Mar 5, 2026Updated 2 weeks ago
- JAX bindings for the flash-attention3 kernels☆21Jan 2, 2026Updated 2 months ago
- Online Inference API for NLP Transformer models - summarization, text classification, sentiment analysis and more☆45Mar 16, 2024Updated 2 years ago
- vLLM adapter for a TGIS-compatible gRPC server.☆55Updated this week
- ☆105Sep 9, 2024Updated last year
- A minimal Python framework for building custom AI inference servers with full control over logic, batching, and scaling.☆3,814Mar 2, 2026Updated 2 weeks ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆220Updated this week