micytao / vllm-playgroundLinks
A modern web interface for managing and interacting with vLLM servers (www.github.com/vllm-project/vllm). Supports both GPU and CPU modes, with special optimizations for macOS Apple Silicon and enterprise deployment on OpenShift/Kubernetes.
☆172Updated this week
Alternatives and similar repositories for vllm-playground
Users that are interested in vllm-playground are comparing it to the libraries listed below
Sorting:
- A command-line interface tool for serving LLM using vLLM.☆456Updated 3 weeks ago
- Common recipes to run vLLM☆305Updated this week
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆354Updated this week
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆765Updated this week
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆174Updated last week
- Benchmark and optimize LLM inference across frameworks with ease☆151Updated 3 months ago
- Route LLM requests to the best model for the task at hand.☆145Updated this week
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆871Updated this week
- ToolOrchestra is an end-to-end RL training framework for orchestrating tools and agentic workflows.☆407Updated last week
- ☆235Updated last month
- An early research stage expert-parallel load balancer for MoE models based on linear programming.☆476Updated last month
- ☆273Updated last week
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆726Updated 3 weeks ago
- Code to accompany the Universal Deep Research paper (https://arxiv.org/abs/2509.00244)☆451Updated 4 months ago
- Self-host LLMs with vLLM and BentoML☆162Updated last month
- The driver for LMCache core to run in vLLM☆59Updated 10 months ago
- Developer Asset Hub for NVIDIA Nemotron — A one-stop resource for training recipes, usage cookbooks, and full end-to-end reference exampl…☆246Updated this week
- Python Implementation of MUVERA (Multi-Vector Retrieval via Fixed Dimensional Encodings)☆382Updated 2 weeks ago
- Inference, Fine Tuning and many more recipes with Gemma family of models☆276Updated 5 months ago
- The LLM abstraction layer for modern AI agent applications.☆500Updated this week
- A framework for efficient model inference with omni-modality models☆1,335Updated this week
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆263Updated this week
- ☆239Updated 2 months ago
- Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research☆273Updated this week
- Inference server benchmarking tool☆132Updated 2 months ago
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆78Updated last year
- ☆321Updated last week
- OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)☆341Updated last week
- Efficient LLM Inference over Long Sequences☆394Updated 6 months ago
- Perplexity open source garden for inference technology☆313Updated this week