The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
☆402Updated this week
Alternatives and similar repositories for worker-vllm
Users that are interested in worker-vllm are comparing it to the libraries listed below
Sorting:
- ☆17Aug 18, 2023Updated 2 years ago
- Starting point to build your own custom serverless endpoint☆133May 9, 2025Updated 9 months ago
- Create embeddings with infinity as serverless endpoint☆42Nov 21, 2025Updated 3 months ago
- SGLang is fast serving framework for large language models and vision language models.☆33Nov 24, 2025Updated 3 months ago
- RunPod worker for Stable Diffusion XL☆42Nov 21, 2025Updated 3 months ago
- Large-scale LLM inference engine☆1,658Feb 17, 2026Updated last week
- ☆54Jun 11, 2023Updated 2 years ago
- A curated list of amazing RunPod projects, libraries, and resources☆127Aug 20, 2024Updated last year
- 1-Click is all you need.☆63Apr 29, 2024Updated last year
- 🐍 | Python library for RunPod API and serverless worker SDK.☆273Feb 13, 2026Updated 2 weeks ago
- ☆198Feb 9, 2024Updated 2 years ago
- Go ahead and axolotl questions☆11,335Updated this week
- ☆14Feb 7, 2024Updated 2 years ago
- vLLM adapter for a TGIS-compatible gRPC server.☆52Updated this week
- Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 20+ clouds, o…☆9,478Updated this week
- ☆67Mar 28, 2025Updated 11 months ago
- 🧰 | Runpod CLI for pod management☆366Updated this week
- ComfyUI as a serverless API on RunPod☆660Feb 12, 2026Updated 2 weeks ago
- Large Language Model Text Generation Inference☆10,788Jan 8, 2026Updated last month
- ⚙️ | REPLACED BY https://github.com/runpod-workers | Official set of serverless worker provided by RunPod as endpoints.☆60Jun 11, 2025Updated 8 months ago
- TheBloke's Dockerfiles☆308Mar 8, 2024Updated last year
- 🐳 | Dockerfiles for the RunPod container images used for our official templates.☆222Dec 17, 2025Updated 2 months ago
- Speech to Speech conversation using the OpenAI RealTime API in Python 🐍☆26Nov 18, 2024Updated last year
- A REST API for vLLM, production ready☆26Oct 20, 2025Updated 4 months ago
- Self-host LLMs with vLLM and BentoML☆169Jan 21, 2026Updated last month
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,134Feb 9, 2026Updated 2 weeks ago
- This project showcases engaging interactions between two AI chatbots.☆10Jan 10, 2024Updated 2 years ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆880Updated this week
- Tools for merging pretrained large language models.☆6,814Jan 26, 2026Updated last month
- 🚀 Scale your RAG pipeline using Ragswift: A scalable centralized embeddings management platform☆38Jan 29, 2024Updated 2 years ago
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆2,787Updated this week
- Self-host LLMs with LMDeploy and BentoML☆22Dec 26, 2025Updated 2 months ago
- Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali☆2,688Feb 5, 2026Updated 3 weeks ago
- SGLang is a high-performance serving framework for large language models and multimodal models.☆23,905Updated this week
- Gugugo: 한국어 오픈소스 번역 모델 프로젝트☆85Apr 7, 2024Updated last year
- Loader extension for tabbyAPI in SillyTavern☆26Jun 30, 2025Updated 8 months ago
- Enforce the output format (JSON Schema, Regex etc) of a language model☆1,992Aug 24, 2025Updated 6 months ago
- RunPod Serverless Worker for the ComfyUI Stable Diffusion API☆21Feb 19, 2026Updated last week
- TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizat…☆12,938Updated this week