openshift-psap / auto-tuning-vllmLinks
Auto-tuning for vllm. Getting the best performance out of your LLM deployment (vllm+guidellm+optuna)
☆22Updated this week
Alternatives and similar repositories for auto-tuning-vllm
Users that are interested in auto-tuning-vllm are comparing it to the libraries listed below
Sorting:
- ☆50Updated 3 months ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆62Updated 2 months ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆700Updated last week
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆65Updated last week
- llm-d benchmark scripts and tooling☆33Updated this week
- ☆267Updated this week
- ☆56Updated last year
- Benchmark suite for LLMs from Fireworks.ai☆83Updated 2 weeks ago
- Achieve state of the art inference performance with modern accelerators on Kubernetes☆2,026Updated last week
- vLLM adapter for a TGIS-compatible gRPC server.☆44Updated this week
- OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)☆312Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Updated last year
- 🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.☆52Updated last week
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆131Updated last month
- ☆312Updated this week
- OpenAI compatible API for TensorRT LLM triton backend☆217Updated last year
- Inference server benchmarking tool☆128Updated last month
- Distributed Model Serving Framework☆178Updated last month
- Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.☆102Updated last year
- InstructLab Training Library - Efficient Fine-Tuning with Message-Format Data☆44Updated this week
- A calculator to estimate the memory footprint, capacity, and latency on VMware Private AI with NVIDIA.☆37Updated 3 months ago
- A tool to detect infrastructure issues on cloud native AI systems☆51Updated 2 months ago
- Helm charts for llm-d☆50Updated 3 months ago
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆300Updated this week
- Cloud Native Benchmarking of Foundation Models