vllm-project / ci-infraLinks
This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
☆21Updated this week
Alternatives and similar repositories for ci-infra
Users that are interested in ci-infra are comparing it to the libraries listed below
Sorting:
- Common recipes to run vLLM☆131Updated this week
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆127Updated 9 months ago
- Simple extension on vLLM to help you speed up reasoning model without training.☆189Updated 3 months ago
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆239Updated this week
- Make SGLang go brrr☆30Updated last week
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆23Updated 9 months ago
- Official implementation for Training LLMs with MXFP4☆91Updated 4 months ago
- vLLM Router☆43Updated last year
- vLLM adapter for a TGIS-compatible gRPC server.☆40Updated this week
- KV cache compression for high-throughput LLM inference☆138Updated 7 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆132Updated 2 weeks ago
- Benchmark suite for LLMs from Fireworks.ai☆83Updated 2 weeks ago
- OLMost every training recipe you need to perform data interventions with the OLMo family of models.☆48Updated this week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆57Updated this week
- Pipeline parallelism for the minimalist☆34Updated last month
- ☆42Updated 4 months ago
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference☆141Updated 4 months ago
- ☆48Updated 2 months ago
- Ongoing research training transformer models at scale☆30Updated this week
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆43Updated 2 months ago
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆234Updated 10 months ago
- A collection of all available inference solutions for the LLMs☆91Updated 6 months ago
- This repository contains Dockerfiles, scripts, yaml files, Helm charts, etc. used to scale out AI containers with versions of TensorFlow …☆52Updated this week
- Train, tune, and infer Bamba model☆132Updated 3 months ago
- DTensor-native pretraining and fine-tuning for LLMs/VLMs with day-0 Hugging Face support, GPU-acceleration, and memory efficiency.☆78Updated this week
- LM engine is a library for pretraining/finetuning LLMs☆66Updated last week
- Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research☆225Updated this week
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆245Updated 2 months ago
- ☆35Updated last month
- A safetensors extension to efficiently store sparse quantized tensors on disk☆161Updated this week