openshift-psap / auto-tuning-vllmLinks
Auto-tuning for vllm. Getting the best performance out of your LLM deployment (vllm+guidellm+optuna)
☆23Updated 2 weeks ago
Alternatives and similar repositories for auto-tuning-vllm
Users that are interested in auto-tuning-vllm are comparing it to the libraries listed below
Sorting:
- IBM development fork of https://github.com/huggingface/text-generation-inference☆62Updated 2 months ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆730Updated last week
- ☆51Updated 4 months ago
- 🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.☆52Updated this week
- vLLM adapter for a TGIS-compatible gRPC server.☆45Updated this week
- InstructLab Training Library - Efficient Fine-Tuning with Message-Format Data☆44Updated this week
- Examples for building and running LLM services and applications locally with Podman☆184Updated 4 months ago
- This is the documentation repository for SGLang. It is auto-generated from https://github.com/sgl-project/sglang/tree/main/docs.☆92Updated this week
- Route LLM requests to the best model for the task at hand.☆143Updated this week
- A collection of all available inference solutions for the LLMs☆93Updated 9 months ago
- ☆268Updated last week
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆140Updated this week
- Self-host LLMs with vLLM and BentoML☆161Updated 2 weeks ago
- Ongoing research training transformer models at scale☆40Updated last week
- Large Language Model Text Generation Inference on Habana Gaudi☆34Updated 8 months ago
- Python library for Evaluation☆16Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated last year
- Evaluation, benchmark, and scorecard, targeting for performance on throughput and latency, accuracy on popular evaluation harness, safety…☆38Updated this week
- 📡 Deploy AI models and apps to Kubernetes without developing a hernia☆33Updated last year
- Helm charts for llm-d☆50Updated 4 months ago
- Python library for Synthetic Data Generation☆51Updated this week
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆327Updated last week
- Benchmark suite for LLMs from Fireworks.ai☆84Updated 2 weeks ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆16Updated this week
- GitHub bot to assist with the taxonomy contribution workflow☆17Updated last year
- Taxonomy tree that will allow you to create models tuned with your data☆287Updated 3 months ago
- Utils for Unsloth https://github.com/unslothai/unsloth☆180Updated this week
- Benchmark and optimize LLM inference across frameworks with ease☆141Updated 2 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆130Updated 2 months ago
- Inference server benchmarking tool☆130Updated 2 months ago