simon-mo / vLLM-BenchmarkLinks
☆28Updated 2 months ago
Alternatives and similar repositories for vLLM-Benchmark
Users that are interested in vLLM-Benchmark are comparing it to the libraries listed below
Sorting:
- A collection of reproducible inference engine benchmarks☆31Updated 2 months ago
- The driver for LMCache core to run in vLLM☆42Updated 4 months ago
- ☆26Updated 3 months ago
- DeeperGEMM: crazy optimized version☆69Updated last month
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)☆28Updated last year
- A benchmarking tool for comparing different LLM API providers' DeepSeek model deployments.☆28Updated 2 months ago
- ☆86Updated 3 months ago
- High-performance safetensors model loader☆40Updated this week
- ☆44Updated 3 weeks ago
- ☆39Updated 5 months ago
- ☆35Updated last month
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆87Updated last month
- ☆37Updated 6 months ago
- ☆11Updated 4 years ago
- extensible collectives library in triton☆86Updated 2 months ago
- KV cache store for distributed LLM inference☆269Updated 2 weeks ago
- ☆54Updated 7 months ago
- ☆55Updated 9 months ago
- Fast and memory-efficient exact attention☆76Updated this week
- Stateful LLM Serving☆73Updated 3 months ago
- ☆72Updated 3 months ago
- Microsoft Collective Communication Library☆64Updated 7 months ago
- A framework for PyTorch to enable fault management for collective communication libraries (CCL) such as NCCL☆19Updated last month
- Home for OctoML PyTorch Profiler☆113Updated 2 years ago
- ☆60Updated 2 months ago
- WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.☆18Updated 2 years ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆116Updated 6 months ago
- Lightning In-Memory Object Store☆46Updated 3 years ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆116Updated 7 months ago
- ☆37Updated this week