sgl-project / sgl-cookbookLinks
Cookbook of SGLang - Recipe
☆50Updated this week
Alternatives and similar repositories for sgl-cookbook
Users that are interested in sgl-cookbook are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning☆58Updated 2 months ago
- ☆116Updated 7 months ago
- A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithm…☆96Updated 4 months ago
- ☆96Updated 9 months ago
- 🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…☆110Updated last month
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆51Updated 6 months ago
- patches for huggingface transformers to save memory☆32Updated 7 months ago
- Block Diffusion for Ultra-Fast Speculative Decoding☆188Updated this week
- Odysseus: Playground of LLM Sequence Parallelism☆79Updated last year
- ☆52Updated 7 months ago
- ☆59Updated 2 years ago
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆73Updated last week
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆53Updated last year
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆138Updated last year
- ☆125Updated 4 months ago
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆198Updated last month
- KV cache compression for high-throughput LLM inference☆148Updated 11 months ago
- A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and cach…☆52Updated 2 months ago
- Fused Qwen3 MoE layer for faster training, compatible with HF Transformers, LoRA, 4-bit quant, Unsloth☆223Updated this week
- ☆133Updated 7 months ago
- Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang☆61Updated last year
- Fast and memory-efficient exact kmeans☆131Updated last month
- ☆63Updated 7 months ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆136Updated last year
- ☆109Updated 3 months ago
- ☆79Updated last month
- Toolchain built around the Megatron-LM for Distributed Training☆80Updated last month
- DPO, but faster 🚀☆46Updated last year
- CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…☆214Updated 3 months ago
- ☆441Updated 4 months ago