bentoml/llm-inference-handbook

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/bentoml/llm-inference-handbook)

bentoml / llm-inference-handbook

Everything you need to know about LLM inference

☆311

Alternatives and similar repositories for llm-inference-handbook

Users that are interested in llm-inference-handbook are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

bentoml / llm-optimizer
View on GitHub
Benchmark and optimize LLM inference across frameworks with ease
☆197Jul 14, 2026Updated last week
bentoml / simple_di
View on GitHub
Simple dependency injection framework for Python
☆21Jul 14, 2026Updated last week
derekburgess / dungen
View on GitHub
☆15Jul 5, 2025Updated last year
yoavg / yoavg.github.io
View on GitHub
☆24Sep 1, 2025Updated 10 months ago
victorb / prompta
View on GitHub
☆20Mar 20, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
skyzh / tiny-llm
View on GitHub
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
☆4,407Updated this week
ome-projects / ome
View on GitHub
Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, T…
☆482Updated this week
sushrut141 / pg_analytica
View on GitHub
Postgres extension that speeds up analytics queries by upto 90%
☆52Jun 8, 2024Updated 2 years ago
dtedesco1 / dtedes.co
View on GitHub
Personal Site
☆20Jun 17, 2026Updated last month
little-book-of / linear-algebra
View on GitHub
A concise, beginner-friendly introduction to the core ideas of linear algebra.
☆1,989Mar 16, 2026Updated 4 months ago
ScalingIntelligence / tokasaurus
View on GitHub
☆483Nov 25, 2025Updated 8 months ago
eudoxia0 / airloom
View on GitHub
A reverse literate programming tool.
☆16Jun 26, 2024Updated 2 years ago
rodmena-limited / highway_dsl
View on GitHub
A domain specific language to build complex workflows
☆15Jun 26, 2026Updated last month
trilogy-data / trilogy-studio-core
View on GitHub
Modern, Fast, Fun - A Semantic-First Data/BI Platform
☆15Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
C-Naoki / image-stitcher
View on GitHub
This is a python implementation for stitching images.
☆229Oct 3, 2024Updated last year
fillipe-gsm / python-kanban
View on GitHub
Terminal user interface for a Kanban board
☆11Nov 5, 2021Updated 4 years ago
luskits / luscsi
View on GitHub
Provides deploy scripts and CSI for Lustre.
☆14Apr 13, 2026Updated 3 months ago
AnthonyRonning / reflex
View on GitHub
☆16May 23, 2026Updated 2 months ago
Dahrkael / ExTracker
View on GitHub
Elixir-powered BitTorrent Tracker
☆340Mar 1, 2026Updated 4 months ago
llm-d / llm-d-inference-sim
View on GitHub
A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual h…
☆170Updated this week
leftouterjoins / voicewrite
View on GitHub
Private, on-device voice-to-text for macOS. Free, open source, no subscription.
☆25Jan 10, 2026Updated 6 months ago
gogongxt / nano-sglang
View on GitHub
☆160Mar 5, 2026Updated 4 months ago
jerpint / context-llemur
View on GitHub
Context management tool for LLM collaboration
☆93Aug 19, 2025Updated 11 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
aperoc / toolkami
View on GitHub
Simple Agents Made Easy
☆615Mar 16, 2026Updated 4 months ago
tonykipkemboi / crewai-anthropic-prompt-caching-cookbook
View on GitHub
A comprehensive cookbook demonstrating how to implement CrewAI with Anthropic's prompt caching feature for efficient LLM operations
☆15Aug 11, 2025Updated 11 months ago
jamesbvaughan / bidirectional-number-editor
View on GitHub
A demonstration of text/GUI bi-directional editing via an LSP server
☆38Jul 1, 2025Updated last year
banagale / slackprep
View on GitHub
slackprep is a CLI tool and Python library that converts Slack export data into structured Markdown or JSONL transcripts
☆16Sep 8, 2025Updated 10 months ago
huggingface / inference-providers-starter-app
View on GitHub
☆17Oct 21, 2025Updated 9 months ago
GeeeekExplorer / nano-vllm
View on GitHub
Nano vLLM
☆14,635Apr 26, 2026Updated 3 months ago
vllm-project / speculators
View on GitHub
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
☆652Updated this week
maximilianhenning / erasmus-data
View on GitHub
☆12Jan 11, 2024Updated 2 years ago
sgl-project / sglang
View on GitHub
SGLang is a high-performance serving framework for large language models and multimodal models.
☆30,733Updated this week
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
rodlaf / BinaryGPUIndex
View on GitHub
A GPU Accelerated Binary Vector Store
☆47Feb 17, 2025Updated last year
trvon / yams
View on GitHub
Persistent memory for LLMs and apps. Content-addressed storage with dedupe, compression, full-text and vector search.
☆375Updated this week
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆6,032Updated this week
joey00072 / Attention-as-graph
View on GitHub
alternative way to calculating self attention
☆18May 25, 2024Updated 2 years ago
wemush / open-standard
View on GitHub
WeMush Open Labeling Standard
☆48Jan 30, 2026Updated 5 months ago
dvlshah / tokenx
View on GitHub
Python decorators to accurately monitor LLM API cost & latency for OpenAI, Anthropic, and other leading AI models
☆20Aug 4, 2025Updated 11 months ago
rescrv / napkin
View on GitHub
Back-of-the-envelope stuffs in Python
☆20Sep 13, 2023Updated 2 years ago