sgl-project / sglang-jaxLinks
JAX backend for SGL
☆71Updated this week
Alternatives and similar repositories for sglang-jax
Users that are interested in sglang-jax are comparing it to the libraries listed below
Sorting:
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆249Updated 3 months ago
- ☆95Updated 6 months ago
- Allow torch tensor memory to be released and resumed later☆144Updated this week
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆215Updated last week
- Bridge Megatron-Core to Hugging Face/Reinforcement Learning☆129Updated this week
- PyTorch bindings for CUTLASS grouped GEMM.☆124Updated 4 months ago
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆58Updated 2 weeks ago
- DeeperGEMM: crazy optimized version☆71Updated 5 months ago
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆417Updated this week
- Utility scripts for PyTorch (e.g. Memory profiler that understands more low-level allocations such as NCCL)☆55Updated last month
- ☆300Updated last week
- ☆148Updated 7 months ago
- Make SGLang go brrr☆33Updated last week
- ☆64Updated 5 months ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆128Updated 10 months ago
- A simple calculation for LLM MFU.☆46Updated last month
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆99Updated last week
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference☆142Updated 4 months ago
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆184Updated last year
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆59Updated 11 months ago
- Perplexity GPU Kernels☆482Updated 3 weeks ago
- Pipeline Parallelism Emulation and Visualization☆67Updated 3 months ago
- 🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…☆79Updated last week
- ☆43Updated last year
- An experimental communicating attention kernel based on DeepEP.☆34Updated 2 months ago
- ☆78Updated 5 months ago
- [NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning☆44Updated 2 weeks ago
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆215Updated last year
- Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.☆102Updated 3 weeks ago
- Toolchain built around the Megatron-LM for Distributed Training☆67Updated last week