sgl-project / mini-sglangLinks
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
☆2,830Updated this week
Alternatives and similar repositories for mini-sglang
Users that are interested in mini-sglang are comparing it to the libraries listed below
Sorting:
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆4,417Updated this week
- slime is an LLM post-training framework for RL Scaling.☆3,224Updated this week
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆1,217Updated 4 months ago
- Materials for learning SGLang☆709Updated 3 weeks ago
- FlashInfer: Kernel Library for LLM Serving☆4,438Updated this week
- Supercharge Your LLM with the Fastest KV Cache Layer☆6,657Updated this week
- Distributed Compiler based on Triton for Parallel Systems☆1,307Updated last week
- Nano vLLM☆10,620Updated 2 months ago
- cuTile is a programming model for writing parallel kernels for NVIDIA GPUs☆1,722Updated 2 weeks ago
- My learning notes for ML SYS.☆4,922Updated this week
- vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization☆2,082Updated 2 weeks ago
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆2,032Updated 2 weeks ago
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆885Updated this week
- Scalable toolkit for efficient model reinforcement☆1,210Updated this week
- A framework for efficient model inference with omni-modality models☆1,977Updated this week
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆615Updated this week
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆4,538Updated this week
- A throughput-oriented high-performance serving framework for LLMs☆929Updated 2 months ago
- Puzzles for learning Triton, play it with minimal environment configuration!☆584Updated last week
- Expert Parallelism Load Balancer☆1,329Updated 9 months ago
- A self-learning tutorail for CUDA High Performance Programing.☆803Updated 6 months ago
- Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs☆915Updated last month
- Disaggregated serving system for Large Language Models (LLMs).☆758Updated 9 months ago
- Fast, Flexible and Portable Structured Generation☆1,447Updated last week
- Analyze computation-communication overlap in V3/R1.☆1,130Updated 9 months ago
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,939Updated 4 months ago
- ☆686Updated this week
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆2,514Updated last week
- Efficient and easy multi-instance LLM serving☆519Updated 4 months ago
- A Datacenter Scale Distributed Inference Serving Framework☆5,716Updated this week