vllm-project / ci-infraLinks

This repo hosts code for vLLM CI & Performance Benchmark infrastructure.

☆26

Alternatives and similar repositories for ci-infra

Users that are interested in ci-infra are comparing it to the libraries listed below

Sorting:

NVIDIA-NeMo / Automodel
Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
☆187Updated this week
snowflakedb / ArcticInference
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆321Updated this week
intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆130Updated 2 months ago
open-lm-engine / lm-engine
LM engine is a library for pretraining/finetuning LLMs
☆77Updated last week
perplexityai / pplx-garden
Perplexity open source garden for inference technology
☆274Updated last week
vllm-project / recipes
Common recipes to run vLLM
☆245Updated this week
vllm-project / speculators
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
☆132Updated last week
opendatahub-io / vllm-tgis-adapter
vLLM adapter for a TGIS-compatible gRPC server.
☆45Updated this week
deepseek-ai / LPLB
An early research stage MoE load balancer based on inear programming.
☆415Updated last week
sgl-project / genai-bench
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆234Updated this week
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆84Updated this week
mani-kantap / llm-inference-solutions
A collection of all available inference solutions for the LLMs
☆92Updated 8 months ago
Infini-AI-Lab / MagicDec
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆132Updated 11 months ago
run-ai / runai-model-streamer
☆267Updated this week
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆61Updated last week
ailzhang / minPP
Pipeline parallelism for the minimalist
☆37Updated 3 months ago
vllm-project / dashboard
vLLM performance dashboard
☆38Updated last year
ServiceNow / Fast-LLM
Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research
☆262Updated this week
huggingface / tgi-gaudi
Large Language Model Text Generation Inference on Habana Gaudi
☆34Updated 8 months ago
IBM / text-generation-inference
IBM development fork of https://github.com/huggingface/text-generation-inference
☆62Updated 2 months ago
coreweave / ml-containers
☆42Updated this week
IsaacRe / vllm-kvcompress
KV cache compression for high-throughput LLM inference
☆143Updated 9 months ago
facebookresearch / LayerSkip
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
☆347Updated 6 months ago
hao-ai-lab / Dynasor
[NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.
☆207Updated 6 months ago
NVIDIA-NeMo / Megatron-Bridge
HuggingFace conversion and training library for Megatron-based models
☆228Updated this week
meta-pytorch / torchforge
PyTorch-native post-training at scale
☆546Updated last week
radixark / miles
☆317Updated this week
huggingface / optimum-habana
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
☆201Updated last week
SalesforceAIResearch / CoDA
Salesforce AI Research's open diffusion language model
☆54Updated last month
meta-pytorch / torchcomms
torchcomms: a modern PyTorch communications API
☆295Updated this week