openshift-psap / auto-tuning-vllmLinks
Auto-tuning for vllm. Getting the best performance out of your LLM deployment (vllm+guidellm+optuna)
☆28Updated last week
Alternatives and similar repositories for auto-tuning-vllm
Users that are interested in auto-tuning-vllm are comparing it to the libraries listed below
Sorting:
- ☆51Updated 4 months ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆62Updated 3 months ago
- Examples for building and running LLM services and applications locally with Podman☆188Updated 4 months ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆777Updated this week
- 🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.☆54Updated last week
- InstructLab Training Library - Efficient Fine-Tuning with Message-Format Data☆44Updated last week
- Taxonomy tree that will allow you to create models tuned with your data☆287Updated 3 months ago
- Python library for Synthetic Data Generation☆51Updated 3 weeks ago
- Route LLM requests to the best model for the task at hand.☆147Updated last week
- ☆23Updated 9 months ago
- Helm charts for llm-d☆50Updated 5 months ago
- Synthetic Data Generation Toolkit for LLMs☆80Updated last week
- Large Language Model Text Generation Inference on Habana Gaudi☆34Updated 9 months ago
- This is the documentation repository for SGLang. It is auto-generated from https://github.com/sgl-project/sglang☆96Updated this week
- Ongoing research training transformer models at scale☆42Updated 2 weeks ago
- vLLM adapter for a TGIS-compatible gRPC server.☆46Updated this week
- InstructLab Community wide collaboration space including contributing, security, code of conduct, etc☆92Updated last month
- 📡 Deploy AI models and apps to Kubernetes without developing a hernia☆33Updated last year
- Benchmark suite for LLMs from Fireworks.ai☆84Updated last month
- ☆273Updated last week
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆357Updated this week
- GitHub bot to assist with the taxonomy contribution workflow☆17Updated last year
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆174Updated last week
- Self-host LLMs with vLLM and BentoML☆163Updated last month
- [⛔️ DEPRECATED] Friendli: the fastest serving engine for generative AI☆49Updated 6 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated 3 weeks ago
- Core repository for an AI-powered OCP assistant service☆61Updated last week
- Red Hat Enterprise Linux AI -- Developer Preview☆169Updated last year
- ☆43Updated this week
- Artifacts for the Distributed Workloads stack as part of ODH☆33Updated last week