simon-mo / vLLM-BenchmarkLinks
☆27Updated last month
Alternatives and similar repositories for vLLM-Benchmark
Users that are interested in vLLM-Benchmark are comparing it to the libraries listed below
Sorting:
- The driver for LMCache core to run in vLLM☆41Updated 4 months ago
- A collection of reproducible inference engine benchmarks☆31Updated last month
- ☆25Updated 3 months ago
- High-performance safetensors model loader☆36Updated this week
- ☆54Updated 8 months ago
- ☆85Updated 2 months ago
- Stateful LLM Serving☆70Updated 2 months ago
- LLM Serving Performance Evaluation Harness☆78Updated 3 months ago
- KV cache store for distributed LLM inference☆254Updated last week
- PyTorch distributed training acceleration framework☆49Updated 3 months ago
- ☆37Updated 5 months ago
- DeeperGEMM: crazy optimized version☆69Updated last month
- ☆37Updated 5 months ago
- Benchmark suite for LLMs from Fireworks.ai☆75Updated 3 weeks ago
- A benchmarking tool for comparing different LLM API providers' DeepSeek model deployments.☆29Updated 2 months ago
- ☆41Updated last week
- extensible collectives library in triton☆87Updated 2 months ago
- ☆119Updated last year
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆81Updated 2 weeks ago
- Lightning In-Memory Object Store☆46Updated 3 years ago
- Home for OctoML PyTorch Profiler☆113Updated 2 years ago
- ☆34Updated last week
- ☆52Updated 6 months ago
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆158Updated 8 months ago
- Microsoft Collective Communication Library☆65Updated 6 months ago
- Fast and memory-efficient exact attention☆72Updated last month
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆100Updated last year
- Perplexity GPU Kernels☆324Updated 2 weeks ago
- A framework for PyTorch to enable fault management for collective communication libraries (CCL) such as NCCL☆19Updated last month
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆116Updated 6 months ago