bentoml/llm-optimizer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/bentoml/llm-optimizer)

bentoml / llm-optimizer

Benchmark and optimize LLM inference across frameworks with ease

☆197

Alternatives and similar repositories for llm-optimizer

Users that are interested in llm-optimizer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

bentoml / IF-multi-GPUs-demo
View on GitHub
☆12Jul 5, 2023Updated 3 years ago
bentoml / simple_di
View on GitHub
Simple dependency injection framework for Python
☆21Jul 14, 2026Updated last week
sgl-project / genai-bench
View on GitHub
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆314Updated this week
bentoml / BentoSentenceTransformers
View on GitHub
how to build a sentence embedding application using BentoML
☆15Jul 14, 2026Updated last week
bentoml / yatai-image-builder
View on GitHub
🐳 Build OCI images for Bentos in k8s
☆19Jul 14, 2026Updated last week
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
vllm-project / guidellm
View on GitHub
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
☆1,429Updated this week
ome-projects / ome
View on GitHub
Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, T…
☆482Updated this week
openshift-psap / auto-tuning-vllm
View on GitHub
Auto-tuning for vllm. Getting the best performance out of your LLM deployment (vllm+guidellm+optuna)
☆64Jun 12, 2026Updated last month
triton-inference-server / perf_analyzer
View on GitHub
☆151Updated this week
MoonshotAI / checkpoint-engine
View on GitHub
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
☆982Jul 4, 2026Updated 3 weeks ago
kubernetes-sigs / gateway-api-inference-extension
View on GitHub
Gateway API Inference Extension
☆723Updated this week
snowflakedb / ArcticInference
View on GitHub
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆462Jul 14, 2026Updated last week
llm-d / llm-d
View on GitHub
Achieve state of the art inference performance with modern accelerators on Kubernetes
☆3,875Updated this week
liushulinle / MarsRL
View on GitHub
MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism
☆18Nov 18, 2025Updated 8 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
llm-d / llm-d-benchmark
View on GitHub
llm-d benchmark scripts and tooling
☆62Updated this week
opea-project / Enterprise-Inference
View on GitHub
Intel® AI for Enterprise Inference optimizes AI inference services on Intel hardware using Kubernetes Orchestration. It automates LLM mod…
☆44Jul 8, 2026Updated 2 weeks ago
aws-samples / sagemaker-bencher
View on GitHub
☆13Nov 1, 2024Updated last year
bentoml / BentoChatTTS
View on GitHub
☆29Jul 14, 2026Updated last week
FogDong / soleclaw
View on GitHub
A self-evolving personal AI assistant.
☆39Mar 13, 2026Updated 4 months ago
DaoCloud / ckube
View on GitHub
Kubernetes APIServer 高性能代理组件，代理 APIServer 的 List 请求，其它类型的请求会直接反向代理到原生 APIServer。 CKube 还额外支持了分页、搜索和索引等功能。并且，CKube 100% 兼容原生 kubectl 和 ku…
☆19Sep 16, 2022Updated 3 years ago
argoproj-labs / rollouts-plugin-trafficrouter-contour
View on GitHub
The Argo Rollouts plugin implementing the Contour HTTPProxy traffic control in progressive delivery scenarios.
☆19Nov 26, 2024Updated last year
Tomiinek / WaveRNN
View on GitHub
WaveRNN Vocoder + TTS
☆11Nov 20, 2021Updated 4 years ago
llm-d / llm-d-routing-sidecar
View on GitHub
Incubating P/D sidecar for llm-d
☆17Nov 13, 2025Updated 8 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
llm-d / llm-d-deployer
View on GitHub
Helm charts for llm-d
☆52Jul 22, 2025Updated last year
ai-dynamo / aiperf
View on GitHub
AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solu…
☆468Updated this week
szaman19 / Materials-Search
View on GitHub
Applying Machine Learning methodologies in search of novel MOF's and battery materials.
☆14May 31, 2023Updated 3 years ago
bentoml / BentoColPali
View on GitHub
☆26Jul 14, 2026Updated last week
vllm-project / production-stack
View on GitHub
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
☆2,474Updated this week
run-ai / runai-model-streamer
View on GitHub
☆330Updated this week
songys / 2021Langcon
View on GitHub
☆11Oct 3, 2021Updated 4 years ago
rh-ai-quickstart / ai-observability-summarizer
View on GitHub
AI quickstart that provides interactive dashboard to analyze AI Model Performance as well as Openshift metrics collected from Prometheus
☆25Jun 9, 2026Updated last month
LMCache / lmcache_frontend
View on GitHub
the frontend of lmcache
☆18Apr 22, 2026Updated 3 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
ai-dynamo / aiconfigurator
View on GitHub
Offline optimization of your disaggregated Dynamo graph
☆374Updated this week
vllm-project / speculators
View on GitHub
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
☆652Updated this week
JoungheeKim / kor-spacing
View on GitHub
This is project for korean auto spacing
☆12Aug 3, 2020Updated 5 years ago
LMCache / LMCache
View on GitHub
LMCache: Supercharge Your LLM with the Fastest KV Cache Layer
☆10,880Updated this week
kubean-io / kube-node-tuning
View on GitHub
Manage kubernetes node-level kernel tuning ( using sysctl ).
☆30Nov 21, 2025Updated 8 months ago
upskyy / Paper-Review
View on GitHub
Paper Review about Speech Recognition · NLP
☆10Mar 25, 2021Updated 5 years ago
tunib-ai / joker
View on GitHub
AI model designed to test the effectiveness in handling external ethical attacks.
☆11Feb 9, 2026Updated 5 months ago