simon-mo / vLLM-BenchmarkLinks

☆31

Alternatives and similar repositories for vLLM-Benchmark

Users that are interested in vLLM-Benchmark are comparing it to the libraries listed below

Sorting:

LMCache / lmcache-vllm
The driver for LMCache core to run in vLLM
☆58Updated 9 months ago
sgl-project / genai-bench
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆232Updated this week
ai-dynamo / aiconfigurator
Offline optimization of your disaggregated Dynamo graph
☆106Updated this week
Michaelvll / llm-ie-benchmarks
A collection of reproducible inference engine benchmarks
☆37Updated 7 months ago
deepspeedai / DeepSpeed-Kernels
☆71Updated 7 months ago
foundation-model-stack / fastsafetensors
High-performance safetensors model loader
☆72Updated last week
meta-pytorch / torchcomms
torchcomms: a modern PyTorch communications API
☆291Updated this week
microsoft / tokenweave
Efficient Compute-Communication Overlap for Distributed LLM Inference
☆62Updated 3 weeks ago
run-ai / llmperf
☆58Updated last year
antgroup / DeepXTrace
DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.
☆68Updated 2 weeks ago
vllm-project / flash-attention
Fast and memory-efficient exact attention
☆99Updated last week
OpenSQZ / MegatronApp
Toolchain built around the Megatron-LM for Distributed Training
☆76Updated this week
project-etalon / etalon
LLM Serving Performance Evaluation Harness
☆80Updated 9 months ago
InternLM / turbomind
☆97Updated 7 months ago
bentoml / llm-bench
☆56Updated last year
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆83Updated last week
infinigence / Semi-PD
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆115Updated 6 months ago
ademeure / DeeperGEMM
DeeperGEMM: crazy optimized version
☆73Updated 6 months ago
octoml / octoml-profile
Home for OctoML PyTorch Profiler
☆114Updated 2 years ago
bytedance / InfiniStore
KV cache store for distributed LLM inference
☆363Updated last week
microsoft / AttentionEngine
☆109Updated 6 months ago
NVIDIA / nvshmem
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…
☆385Updated last week
intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆131Updated 2 months ago
perplexityai / pplx-garden
Perplexity open source garden for inference technology
☆232Updated this week
meta-pytorch / BackendBench
How to ensure correctness and ship LLM generated kernels in PyTorch
☆121Updated last week
vllm-project / speculators
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
☆70Updated this week
IBM / triton-dejavu
Framework to reduce autotune overhead to zero for well known deployments.
☆85Updated 2 months ago
anyscale / llm-continuous-batching-benchmarks
☆122Updated last year
Azure / msccl
Microsoft Collective Communication Library
☆66Updated last year
abcdabcd987 / libfabric-efa-demo
☆71Updated 10 months ago