sgl-project / sgl-cookbookLinks
Make SGLang go brrr
☆33Updated 3 weeks ago
Alternatives and similar repositories for sgl-cookbook
Users that are interested in sgl-cookbook are comparing it to the libraries listed below
Sorting:
- ☆50Updated 4 months ago
- DeeperGEMM: crazy optimized version☆71Updated 4 months ago
- ☆95Updated 6 months ago
- ☆64Updated 5 months ago
- An experimental communicating attention kernel based on DeepEP.☆34Updated 2 months ago
- ☆98Updated 4 months ago
- 🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…☆71Updated this week
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆59Updated 10 months ago
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆43Updated 2 months ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆128Updated 9 months ago
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆214Updated last week
- ☆78Updated 5 months ago
- JAX backend for SGL☆64Updated last week
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆54Updated last week
- Allow torch tensor memory to be released and resumed later☆142Updated last week
- ☆126Updated 4 months ago
- Quantized Attention on GPU☆44Updated 10 months ago
- Estimate MFU for DeepSeekV3☆24Updated 8 months ago
- ☆112Updated last month
- Odysseus: Playground of LLM Sequence Parallelism☆77Updated last year
- A simple calculation for LLM MFU.☆46Updated 3 weeks ago
- Utility scripts for PyTorch (e.g. Memory profiler that understands more low-level allocations such as NCCL)☆55Updated 3 weeks ago
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆66Updated this week
- PyTorch bindings for CUTLASS grouped GEMM.☆121Updated 4 months ago
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆50Updated last year
- ☆38Updated last month
- Framework to reduce autotune overhead to zero for well known deployments.☆84Updated 2 weeks ago
- Debug print operator for cudagraph debugging☆13Updated last year
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆130Updated 2 weeks ago
- ☆30Updated 3 months ago