sgl-project / sglang-jaxLinks
JAX backend for SGL
☆34Updated last week
Alternatives and similar repositories for sglang-jax
Users that are interested in sglang-jax are comparing it to the libraries listed below
Sorting:
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆245Updated 2 months ago
- ☆94Updated 5 months ago
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆59Updated 10 months ago
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆181Updated 11 months ago
- Allow torch tensor memory to be released and resumed later☆133Updated last week
- ☆64Updated 4 months ago
- Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.☆27Updated last month
- A simple calculation for LLM MFU.☆44Updated last week
- ☆291Updated 2 weeks ago
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆209Updated 2 weeks ago
- ☆42Updated last year
- DeeperGEMM: crazy optimized version☆70Updated 4 months ago
- An experimental communicating attention kernel based on DeepEP.☆34Updated last month
- ☆51Updated this week
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆400Updated this week
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …☆264Updated 3 months ago
- ☆78Updated 5 months ago
- kvcached: Elastic KV cache for dynamic GPU sharing and efficient multi-LLM inference.☆91Updated this week
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆100Updated this week
- Make SGLang go brrr☆29Updated last week
- Utility scripts for PyTorch (e.g. Memory profiler that understands more low-level allocations such as NCCL)☆52Updated last week
- ☆147Updated 6 months ago
- ☆38Updated last month
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆62Updated this week
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆50Updated last week
- Bridge Megatron-Core to Hugging Face/Reinforcement Learning☆120Updated 2 weeks ago
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference☆141Updated 4 months ago
- ☆71Updated last year
- ☆50Updated 4 months ago
- Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.☆80Updated 3 weeks ago