sgl-project / genai-benchLinks
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.
☆209Updated 2 weeks ago
Alternatives and similar repositories for genai-bench
Users that are interested in genai-bench are comparing it to the libraries listed below
Sorting:
- Allow torch tensor memory to be released and resumed later☆133Updated last week
- Dynamic Memory Management for Serving LLMs without PagedAttention☆416Updated 3 months ago
- Perplexity GPU Kernels☆461Updated last month
- A low-latency & high-throughput serving engine for LLMs☆416Updated 3 months ago
- ☆291Updated 2 weeks ago
- JAX backend for SGL☆60Updated this week
- ☆121Updated last year
- ☆94Updated 5 months ago
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆400Updated this week
- Zero Bubble Pipeline Parallelism☆426Updated 4 months ago
- Materials for learning SGLang☆572Updated 2 weeks ago
- Stateful LLM Serving☆84Updated 6 months ago
- LLM Serving Performance Evaluation Harness☆79Updated 6 months ago
- Efficient and easy multi-instance LLM serving☆484Updated 2 weeks ago
- PyTorch distributed training acceleration framework☆52Updated last month
- A lightweight design for computation-communication overlap.☆167Updated last week
- Applied AI experiments and examples for PyTorch☆294Updated 3 weeks ago
- Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.☆89Updated last week
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆245Updated 2 months ago
- Latency and Memory Analysis of Transformer Models for Training and Inference☆454Updated 5 months ago
- Offline optimization of your disaggregated Dynamo graph☆63Updated this week
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆239Updated this week
- Utility scripts for PyTorch (e.g. Memory profiler that understands more low-level allocations such as NCCL)☆52Updated last week
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆108Updated 4 months ago
- PyTorch bindings for CUTLASS grouped GEMM.