backprop-ai / vllm-benchmark
Benchmarking the serving capabilities of vLLM
β42Updated 8 months ago
Alternatives and similar repositories for vllm-benchmark:
Users that are interested in vllm-benchmark are comparing it to the libraries listed below
- β53Updated 10 months ago
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.β65Updated last year
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β136Updated 9 months ago
- Data preparation code for Amber 7B LLMβ88Updated 11 months ago
- A collection of all available inference solutions for the LLMsβ86Updated last month
- Benchmark suite for LLMs from Fireworks.aiβ70Updated 2 months ago
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.β82Updated last month
- Utils for Unslothβ73Updated last week
- Simple examples using Argilla tools to build AIβ52Updated 5 months ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.β34Updated 4 months ago
- Machine Learning Serving focused on GenAI with simplicity as the top priority.β58Updated 2 weeks ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language Mβ¦β213Updated 5 months ago
- Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platformβ87Updated this week
- β24Updated 2 months ago
- TextEmbed is a REST API crafted for high-throughput and low-latency embedding inference. It accommodates a wide variety of embedding modeβ¦β23Updated 7 months ago
- Lightweight demos for finetuning LLMs. Powered by π€ transformers and open-source datasets.β76Updated 6 months ago
- Data preparation code for CrystalCoder 7B LLMβ44Updated 11 months ago
- β113Updated 2 weeks ago
- β246Updated last week
- β50Updated 5 months ago
- Function Calling Benchmark & Testingβ87Updated 9 months ago
- IBM development fork of https://github.com/huggingface/text-generation-inferenceβ60Updated 4 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ262Updated 6 months ago
- vLLM performance dashboardβ27Updated last year
- A framework for evaluating function calls made by LLMsβ37Updated 9 months ago
- vLLM Routerβ26Updated last year
- Sentence Transformers API: An OpenAI compatible embedding API serverβ54Updated 7 months ago
- Using LlamaIndex with Ray for productionizing LLM applicationsβ71Updated last year
- Self-host LLMs with vLLM and BentoMLβ106Updated last week
- β53Updated 7 months ago