vllm-project / ci-infraLinks
This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
☆21Updated this week
Alternatives and similar repositories for ci-infra
Users that are interested in ci-infra are comparing it to the libraries listed below
Sorting:
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆132Updated 2 weeks ago
- Common recipes to run vLLM☆154Updated this week
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆270Updated this week
- vLLM adapter for a TGIS-compatible gRPC server.☆41Updated this week
- OLMost every training recipe you need to perform data interventions with the OLMo family of models.☆50Updated last week
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆130Updated 10 months ago
- This repository contains Dockerfiles, scripts, yaml files, Helm charts, etc. used to scale out AI containers with versions of TensorFlow …☆52Updated this week
- [NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning☆44Updated 2 weeks ago
- Make SGLang go brrr☆35Updated last week
- ☆255Updated last week
- LM engine is a library for pretraining/finetuning LLMs☆69Updated last week
- ☆48Updated 2 months ago
- The driver for LMCache core to run in vLLM☆52Updated 8 months ago
- Route LLM requests to the best model for the task at hand.☆108Updated 2 weeks ago
- A collection of reproducible inference engine benchmarks☆33Updated 5 months ago
- A collection of all available inference solutions for the LLMs☆91Updated 7 months ago
- Pipeline parallelism for the minimalist☆35Updated 2 months ago
- KV cache compression for high-throughput LLM inference☆141Updated 8 months ago
- Pytorch DTensor native training library for LLMs/VLMs with OOB Hugging Face support☆90Updated this week
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆235Updated 10 months ago
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.☆196Updated 4 months ago
- [WIP] Better (FP8) attention for Hopper☆33Updated 7 months ago
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆74Updated 3 weeks ago
- ☆43Updated 5 months ago
- Benchmark suite for LLMs from Fireworks.ai☆83Updated last week
- GPU Environment Management for Visual Studio Code☆39Updated 2 years ago
- ☆36Updated 2 weeks ago
- Official implementation for Training LLMs with MXFP4☆96Updated 5 months ago
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆23Updated 9 months ago
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆72Updated last year