backprop-ai / vllm-benchmarkLinks
Benchmarking the serving capabilities of vLLM
☆48Updated last year
Alternatives and similar repositories for vllm-benchmark
Users that are interested in vllm-benchmark are comparing it to the libraries listed below
Sorting:
- A collection of all available inference solutions for the LLMs☆91Updated 5 months ago
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆70Updated last year
- Self-host LLMs with vLLM and BentoML☆140Updated this week
- Benchmark suite for LLMs from Fireworks.ai☆80Updated 3 weeks ago
- ☆63Updated 5 months ago
- Data preparation code for Amber 7B LLM☆91Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Updated 10 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆88Updated last week
- Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.☆42Updated last year
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆78Updated 10 months ago
- Simple examples using Argilla tools to build AI☆53Updated 9 months ago
- ☆291Updated 3 weeks ago
- Comparison of Language Model Inference Engines☆229Updated 8 months ago
- GPT-4 Level Conversational QA Trained In a Few Hours☆63Updated last year
- ☆51Updated last year
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆236Updated 9 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆349Updated 11 months ago
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆137Updated last year
- ☆134Updated last week
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆111Updated 4 months ago
- 1.58-bit LLaMa model☆82Updated last year
- Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platform☆89Updated 2 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆128Updated 2 weeks ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆61Updated 3 months ago
- Open Source Text Embedding Models with OpenAI Compatible API☆159Updated last year
- Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research☆223Updated this week
- This is an NVIDIA AI Workbench example project that demonstrates an end-to-end model development workflow using Llamafactory.☆64Updated 10 months ago
- Evaluation of bm42 sparse indexing algorithm☆68Updated last year
- FineTune LLMs in few lines of code (Text2Text, Text2Speech, Speech2Text)☆241Updated last year
- OpenAI compatible API for TensorRT LLM triton backend☆214Updated last year