vllm-project / ci-infraLinks
This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
☆29Updated this week
Alternatives and similar repositories for ci-infra
Users that are interested in ci-infra are comparing it to the libraries listed below
Sorting:
- Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support☆259Updated this week
- LM engine is a library for pretraining/finetuning LLMs☆113Updated this week
- vLLM adapter for a TGIS-compatible gRPC server.☆50Updated this week
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆379Updated last week
- Pipeline parallelism for the minimalist☆38Updated 5 months ago
- Memory optimized Mixture of Experts☆72Updated 6 months ago
- ☆63Updated 6 months ago
- ☆87Updated last week
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆131Updated 4 months ago
- All information and news with respect to Falcon-H1 series☆106Updated 3 months ago
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆205Updated last week
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆137Updated last year
- ☆82Updated 2 months ago
- 🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…☆115Updated 2 months ago
- Benchmark suite for LLMs from Fireworks.ai☆86Updated 2 weeks ago
- Codebase for Cuda Learning☆29Updated last year
- Perplexity open source garden for inference technology☆350Updated last month
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆26Updated last year
- The source of LMSYS website and blogs☆75Updated this week
- OLMost every training recipe you need to perform data interventions with the OLMo family of models.☆64Updated last week
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆17Updated last year
- A collection of all available inference solutions for the LLMs☆94Updated 10 months ago
- ☆278Updated last week
- IBM development fork of https://github.com/huggingface/text-generation-inference☆63Updated 4 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆37Updated 3 months ago
- Data recipes and robust infrastructure for training AI agents☆84Updated last week
- KV cache compression for high-throughput LLM inference☆150Updated 11 months ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆60Updated last year
- Google TPU optimizations for transformers models☆133Updated last week
- ☆117Updated 3 weeks ago