simon-mo / vLLM-Benchmark
☆25Updated 3 weeks ago
Alternatives and similar repositories for vLLM-Benchmark
Users that are interested in vLLM-Benchmark are comparing it to the libraries listed below
Sorting:
- The driver for LMCache core to run in vLLM☆40Updated 3 months ago
- A collection of reproducible inference engine benchmarks☆30Updated 3 weeks ago
- DeeperGEMM: crazy optimized version☆69Updated last week
- ☆84Updated last month
- extensible collectives library in triton☆86Updated last month
- ☆50Updated 5 months ago
- Benchmark suite for LLMs from Fireworks.ai☆73Updated this week
- ☆32Updated this week
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆116Updated 5 months ago
- High-performance safetensors model loader☆30Updated last month
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆155Updated 7 months ago
- ☆34Updated 4 months ago
- Stateful LLM Serving☆67Updated 2 months ago
- ☆36Updated 5 months ago
- ☆69Updated last month
- A minimal implementation of vllm.☆40Updated 9 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆70Updated this week
- Boosting 4-bit inference kernels with 2:4 Sparsity☆73Updated 8 months ago
- ☆26Updated last year
- LLM Serving Performance Evaluation Harness☆78Updated 2 months ago
- ☆11Updated 4 years ago
- ☆70Updated last week
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆73Updated 2 weeks ago
- Fast and memory-efficient exact attention☆68Updated 2 weeks ago
- Hydragen: High-Throughput LLM Inference with Shared Prefixes☆36Updated last year
- ☆117Updated last year
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆85Updated this week
- ☆58Updated 3 weeks ago
- TORCH_LOGS parser for PT2☆37Updated 3 weeks ago
- Lightning In-Memory Object Store☆45Updated 3 years ago