sgl-project/sglang

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sgl-project/sglang)

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

☆30,306

Alternatives and similar repositories for sglang

Users that are interested in sglang are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

vllm-project / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆86,251Updated this week
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆5,957Updated this week
NVIDIA / TensorRT-LLM
View on GitHub
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizat…
☆14,113Updated this week
InternLM / lmdeploy
View on GitHub
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
☆7,953Updated this week
Dao-AILab / flash-attention
View on GitHub
Fast and memory-efficient exact attention
☆24,452Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
kvcache-ai / Mooncake
View on GitHub
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
☆5,821Updated this week
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,469Updated this week
NVIDIA / Megatron-LM
View on GitHub
Ongoing research training transformer models at scale
☆17,064Updated this week
triton-lang / triton
View on GitHub
Development repository for the Triton language and compiler
☆19,678Updated this week
deepspeedai / DeepSpeed
View on GitHub
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
☆42,707Updated this week
ai-dynamo / dynamo
View on GitHub
A Datacenter Scale Distributed Inference Serving Framework
☆7,483Updated this week
huggingface / text-generation-inference
View on GitHub
Large Language Model Text Generation Inference
☆10,871Mar 21, 2026Updated 3 months ago
hiyouga / LlamaFactory
View on GitHub
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
☆73,271Updated this week
ModelTC / LightLLM
View on GitHub
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabili…
☆4,167Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
unslothai / unsloth
View on GitHub
Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.
☆68,204Updated this week
zhaochenyang20 / Awesome-ML-SYS-Tutorial
View on GitHub
My learning notes for ML SYS.
☆6,710Updated this week
linkedin / Liger-Kernel
View on GitHub
Efficient Triton Kernels for LLM Training
☆6,504Updated this week
OpenRLHF / OpenRLHF
View on GitHub
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…
☆9,788Updated this week
huggingface / trl
View on GitHub
Train transformer language models with reinforcement learning.
☆18,844Updated this week
deepseek-ai / FlashMLA
View on GitHub
FlashMLA: Efficient Multi-head Latent Attention Kernels
☆12,744Apr 30, 2026Updated 2 months ago
kvcache-ai / ktransformers
View on GitHub
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
☆17,433Updated this week
ggml-org / llama.cpp
View on GitHub
LLM inference in C/C++
☆120,346Updated this week
deepseek-ai / DeepEP
View on GitHub
DeepEP: an efficient expert-parallel communication library
☆9,844Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
NVIDIA / cutlass
View on GitHub
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆10,076Updated this week
lm-sys / FastChat
View on GitHub
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
☆39,491May 1, 2026Updated 2 months ago
xlite-dev / Awesome-LLM-Inference
View on GitHub
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
☆5,392Jun 23, 2026Updated 3 weeks ago
NVIDIA / FasterTransformer
View on GitHub
Transformer related optimization, including BERT, GPT
☆6,438Mar 27, 2024Updated 2 years ago
deepseek-ai / open-infra-index
View on GitHub
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
☆8,016May 15, 2025Updated last year
deepseek-ai / DeepGEMM
View on GitHub
DeepGEMM: clean and efficient BLAS kernel library on GPU
☆7,510Updated this week
stanfordnlp / dspy
View on GitHub
DSPy: The framework for programming—not prompting—language models
☆36,125Updated this week
huggingface / open-r1
View on GitHub
Fully open reproduction of DeepSeek-R1
☆26,404Apr 2, 2026Updated 3 months ago
bitsandbytes-foundation / bitsandbytes
View on GitHub
Accessible large language models via k-bit quantization for PyTorch.
☆8,323Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
huggingface / peft
View on GitHub
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
☆21,394Updated this week
LMCache / LMCache
View on GitHub
LMCache: Supercharge Your LLM with the Fastest KV Cache Layer
☆10,547Updated this week
QwenLM / Qwen3
View on GitHub
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
☆27,393Jan 9, 2026Updated 6 months ago
EleutherAI / lm-evaluation-harness
View on GitHub
A framework for few-shot evaluation of language models.
☆13,279Updated this week
mit-han-lab / llm-awq
View on GitHub
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
☆3,589Jul 17, 2025Updated 11 months ago
ray-project / ray
View on GitHub
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
☆43,248Updated this week
mlc-ai / mlc-llm
View on GitHub
Universal LLM Deployment Engine with ML Compilation
☆22,948Updated this week