backprop-ai / vllm-benchmarkLinks
Benchmarking the serving capabilities of vLLM
☆51Updated last year
Alternatives and similar repositories for vllm-benchmark
Users that are interested in vllm-benchmark are comparing it to the libraries listed below
Sorting:
- Self-host LLMs with vLLM and BentoML☆149Updated last week
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆72Updated last year
- A collection of all available inference solutions for the LLMs☆91Updated 6 months ago
- Data preparation code for Amber 7B LLM☆93Updated last year
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆132Updated 2 weeks ago
- An innovative library for efficient LLM inference via low-bit quantization☆348Updated last year
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆138Updated last year
- GPT-4 Level Conversational QA Trained In a Few Hours☆64Updated last year
- ☆296Updated last week
- ☆51Updated last year
- OpenAI compatible API for TensorRT LLM triton backend☆214Updated last year
- ☆64Updated 5 months ago
- FineTune LLMs in few lines of code (Text2Text, Text2Speech, Speech2Text)☆242Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Updated 11 months ago
- Benchmark suite for LLMs from Fireworks.ai☆83Updated 2 weeks ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆240Updated 10 months ago
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆111Updated 5 months ago
- Easy and Efficient Quantization for Transformers☆203Updated 2 months ago
- vLLM Router☆43Updated last year
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆89Updated this week
- Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research☆225Updated this week
- Self-host LLMs with LMDeploy and BentoML☆22Updated 2 months ago
- ☆76Updated 8 months ago
- ☆135Updated last month
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆201Updated last year
- A pipeline for LLM knowledge distillation☆107Updated 5 months ago
- Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platform☆90Updated last week
- ☆102Updated last year
- Comparison of Language Model Inference Engines☆229Updated 9 months ago
- ☆231Updated 2 months ago