sgl-project / sglang-jaxView external linksLinks
JAX backend for SGL
☆237Updated this week
Alternatives and similar repositories for sglang-jax
Users that are interested in sglang-jax are comparing it to the libraries listed below
Sorting:
- Minimal yet performant LLM examples in pure JAX☆242Jan 14, 2026Updated last month
- ☆13Jan 7, 2025Updated last year
- Tensor Parallelism with JAX + Shard Map☆11Sep 29, 2023Updated 2 years ago
- Tokamax: A GPU and TPU kernel library.☆172Updated this week
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- DeeperGEMM: crazy optimized version☆74May 5, 2025Updated 9 months ago
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆270Feb 2, 2026Updated 2 weeks ago
- Einsum-like high-level array sharding API for JAX☆34Jul 16, 2024Updated last year
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆266Updated this week
- TPU inference for vLLM, with unified JAX and PyTorch support.☆231Updated this week
- Turn jitted jax functions back into python source code☆23Dec 16, 2024Updated last year
- study of cutlass☆22Nov 10, 2024Updated last year
- Tile-based language built for AI computation across all scales☆123Updated this week
- Distributed pretraining of large language models (LLMs) on cloud TPU slices, with Jax and Equinox.☆24Sep 29, 2024Updated last year
- A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.☆91Dec 17, 2025Updated 2 months ago
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 6 months ago
- ☆15May 11, 2025Updated 9 months ago
- Tidy autoregressive inference in JAX☆15Sep 1, 2025Updated 5 months ago
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆79Dec 18, 2025Updated last month
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 6 months ago
- Frechet inception distance (FID) evaluation in JAX☆14May 28, 2024Updated last year
- TPU에서 한국어용 LLM 추론을 위한 Jax/Flax 구현체입니다.☆12Jun 12, 2023Updated 2 years ago
- ☕️ A vscode extension for netron, support *.pdmodel, *.nb, *.onnx, *.pb, *.h5, *.tflite, *.pth, *.pt, *.mnn, *.param, etc.☆14Jun 4, 2023Updated 2 years ago
- 커버리스트 - 북 커버 생성 AI 서비스☆13Sep 11, 2022Updated 3 years ago
- Kernel Library Wheel for SGLang☆17Updated this week
- Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, T…☆370Updated this week
- JAX bindings for the flash-attention3 kernels☆20Jan 2, 2026Updated last month
- An efficient method for the conversion from internal to Cartesian coordinates that utilizes the platform-agnostic JAX Python library.☆21Jun 12, 2024Updated last year
- a Jax/Flax inference code of StarCoder☆12Jun 12, 2023Updated 2 years ago
- SKT'22 AI Fellowship, 딥러닝 기반 흑백 이미지 컬러화 기술 개발☆13Jun 7, 2023Updated 2 years ago
- Serving large language model with transformers☆13Oct 18, 2022Updated 3 years ago
- ☆42Jan 24, 2026Updated 3 weeks ago
- ☆97Mar 26, 2025Updated 10 months ago
- jax-triton contains integrations between JAX and OpenAI Triton☆439Feb 9, 2026Updated last week
- FlashInfer: Kernel Library for LLM Serving☆4,935Feb 10, 2026Updated last week
- Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.☆57Aug 15, 2025Updated 6 months ago
- Cli Prometheus metrics viewer.☆14May 3, 2023Updated 2 years ago
- Jax implementation of "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆15May 10, 2024Updated last year
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆22Feb 9, 2026Updated last week