openshift-psap / auto-tuning-vllmLinks

Auto-tuning for vllm. Getting the best performance out of your LLM deployment (vllm+guidellm+optuna)

☆23

Alternatives and similar repositories for auto-tuning-vllm

Users that are interested in auto-tuning-vllm are comparing it to the libraries listed below

Sorting:

IBM / text-generation-inference
IBM development fork of https://github.com/huggingface/text-generation-inference
☆62Updated 2 months ago
vllm-project / guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
☆730Updated last week
openshift-psap / llm-load-test
☆51Updated 4 months ago
foundation-model-stack / fms-hf-tuning
🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.
☆52Updated this week
opendatahub-io / vllm-tgis-adapter
vLLM adapter for a TGIS-compatible gRPC server.
☆45Updated this week
instructlab / training
InstructLab Training Library - Efficient Fine-Tuning with Message-Format Data
☆44Updated this week
containers / ai-lab-recipes
Examples for building and running LLM services and applications locally with Podman
☆184Updated 4 months ago
sgl-project / sgl-project.github.io
This is the documentation repository for SGLang. It is auto-generated from https://github.com/sgl-project/sglang/tree/main/docs.
☆92Updated this week
NVIDIA-AI-Blueprints / llm-router
Route LLM requests to the best model for the task at hand.
☆143Updated this week
mani-kantap / llm-inference-solutions
A collection of all available inference solutions for the LLMs
☆93Updated 9 months ago
run-ai / runai-model-streamer
☆268Updated last week
vllm-project / speculators
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
☆140Updated this week
bentoml / BentoVLLM
Self-host LLMs with vLLM and BentoML
☆161Updated 2 weeks ago
swiss-ai / Megatron-LM
Ongoing research training transformer models at scale
☆40Updated last week
huggingface / tgi-gaudi
Large Language Model Text Generation Inference on Habana Gaudi
☆34Updated 8 months ago
instructlab / eval
Python library for Evaluation
☆16Updated last week
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆267Updated last year
opea-project / GenAIEval
Evaluation, benchmark, and scorecard, targeting for performance on throughput and latency, accuracy on popular evaluation harness, safety…
☆38Updated this week
prem-research / prem-operator
📡 Deploy AI models and apps to Kubernetes without developing a hernia
☆33Updated last year
llm-d / llm-d-deployer
Helm charts for llm-d
☆50Updated 4 months ago
instructlab / sdg
Python library for Synthetic Data Generation
☆51Updated this week
snowflakedb / ArcticInference
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆327Updated last week
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆84Updated 2 weeks ago
neuralmagic / vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆16Updated this week
instructlab / instructlab-bot
GitHub bot to assist with the taxonomy contribution workflow
☆17Updated last year
instructlab / taxonomy
Taxonomy tree that will allow you to create models tuned with your data
☆287Updated 3 months ago
unslothai / unsloth-zoo
Utils for Unsloth https://github.com/unslothai/unsloth
☆180Updated this week
bentoml / llm-optimizer
Benchmark and optimize LLM inference across frameworks with ease
☆141Updated 2 months ago
intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆130Updated 2 months ago
huggingface / inference-benchmarker
Inference server benchmarking tool
☆130Updated 2 months ago