vllm-project / vllm-project.github.ioLinks
☆29Updated this week
Alternatives and similar repositories for vllm-project.github.io
Users that are interested in vllm-project.github.io are comparing it to the libraries listed below
Sorting:
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆131Updated 4 months ago
- ☆17Updated 7 months ago
- Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, T…☆365Updated this week
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆902Updated last week
- PyTorch-native post-training at scale☆613Updated this week
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆384Updated last week
- ☆280Updated this week
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆263Updated this week
- ☆322Updated last year
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆773Updated this week
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆830Updated last week
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆475Updated this week
- Common recipes to run vLLM☆364Updated this week
- An early research stage expert-parallel load balancer for MoE models based on linear programming.☆495Updated 2 months ago
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆404Updated last month
- The driver for LMCache core to run in vLLM☆60Updated last year
- A high-performance RL training-inference weight synchronization framework, designed to enable second-level parameter updates from trainin…☆131Updated last month
- Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support☆266Updated this week
- This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.☆451Updated last year
- A tool to configure, launch and manage your machine learning experiments.☆216Updated this week
- 🎉 An awesome & curated list of best LLMOps tools.☆190Updated this week
- Perplexity open source garden for inference technology☆359Updated last month
- Perplexity GPU Kernels☆554Updated 3 months ago
- ☆31Updated 9 months ago
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆262Updated this week
- ☆44Updated last week
- Inference server benchmarking tool☆142Updated 4 months ago
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆220Updated last week
- ☆61Updated last year
- NVIDIA NCCL Tests for Distributed Training☆134Updated last week