sgl-project / genai-benchLinks
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.
☆195Updated last week
Alternatives and similar repositories for genai-bench
Users that are interested in genai-bench are comparing it to the libraries listed below
Sorting:
- Dynamic Memory Management for Serving LLMs without PagedAttention☆407Updated 3 months ago
- Allow torch tensor memory to be released and resumed later☆115Updated 2 weeks ago
- Perplexity GPU Kernels☆449Updated 3 weeks ago
- A low-latency & high-throughput serving engine for LLMs☆408Updated 3 months ago
- ☆121Updated last year
- LLM Serving Performance Evaluation Harness☆79Updated 6 months ago
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆325Updated last week
- ☆284Updated this week
- ☆92Updated 5 months ago
- Materials for learning SGLang☆549Updated last week
- Stateful LLM Serving☆81Updated 5 months ago
- Zero Bubble Pipeline Parallelism☆421Updated 3 months ago
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆238Updated last month
- Efficient and easy multi-instance LLM serving☆473Updated last week
- Latency and Memory Analysis of Transformer Models for Training and Inference☆449Updated 4 months ago
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆223Updated this week
- A minimal implementation of vllm.☆51Updated last year
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆214Updated last year
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆180Updated 11 months ago
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving☆319Updated last year
- PyTorch distributed training acceleration framework☆52Updated 2 weeks ago
- Fast and memory-efficient exact attention☆91Updated this week
- ☆69Updated last year
- Applied AI experiments and examples for PyTorch☆292Updated last week
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆128Updated last year
- nnScaler: Compiling DNN models for Parallel Training☆115Updated last week
- ☆196Updated 3 months ago
- Common recipes to run vLLM☆110Updated last week
- The driver for LMCache core to run in vLLM☆47Updated 6 months ago
- PyTorch library for cost-effective, fast and easy serving of MoE models.☆224Updated last month