bentoml / llm-benchLinks

☆56

Alternatives and similar repositories for llm-bench

Users that are interested in llm-bench are comparing it to the libraries listed below

Sorting:

fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆82Updated last week
intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆131Updated last month
anyscale / llm-continuous-batching-benchmarks
☆121Updated last year
triton-inference-server / vllm_backend
☆302Updated this week
run-ai / llmperf
☆58Updated last year
neuralmagic / AutoFP8
☆205Updated 5 months ago
LMCache / lmcache-vllm
The driver for LMCache core to run in vLLM
☆54Updated 8 months ago
project-etalon / etalon
LLM Serving Performance Evaluation Harness
☆79Updated 8 months ago
sgl-project / genai-bench
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆220Updated last week
vllm-project / speculators
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
☆60Updated this week
LLM-inference-router / vllm-router
vLLM Router
☆45Updated last year
NetEase-FuXi / EETQ
Easy and Efficient Quantization for Transformers
☆202Updated 4 months ago
snowflakedb / ArcticInference
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆283Updated this week
vllm-project / recipes
Common recipes to run vLLM
☆172Updated this week
simon-mo / vLLM-Benchmark
☆31Updated 6 months ago
huggingface / optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…
☆318Updated last month
npuichigo / openai_trtllm
OpenAI compatible API for TensorRT LLM triton backend
☆215Updated last year
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆266Updated last year
inferflow / inferflow
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
☆248Updated last year
ovg-project / kvcached
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
☆104Updated this week
IsaacRe / vllm-kvcompress
KV cache compression for high-throughput LLM inference
☆142Updated 8 months ago
lapp0 / lm-inference-engines
Comparison of Language Model Inference Engines
☆232Updated 10 months ago
asprenger / ray_vllm_inference
A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
☆73Updated last year
vectorch-ai / ScaleLLM
A high-performance inference system for large language models, designed for production environments.
☆479Updated 2 weeks ago
microsoft / batch-inference
Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.
☆102Updated last year
MDK8888 / vllmini
A minimal implementation of vllm.
☆58Updated last year
InternLM / turbomind
☆97Updated 7 months ago
triton-inference-server / perf_analyzer
☆114Updated 2 weeks ago
vllm-project / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆180Updated this week
yale-sys / prompt-cache
Modular and structured prompt caching for low-latency LLM inference
☆101Updated 11 months ago