JAX backend for SGL
☆243Mar 3, 2026Updated last week
Alternatives and similar repositories for sglang-jax
Users that are interested in sglang-jax are comparing it to the libraries listed below
Sorting:
- Minimal yet performant LLM examples in pure JAX☆240Jan 14, 2026Updated last month
- Tokamax: A GPU and TPU kernel library.☆180Updated this week
- ☆13Jan 7, 2025Updated last year
- DeeperGEMM: crazy optimized version☆74May 5, 2025Updated 10 months ago
- Tensor Parallelism with JAX + Shard Map☆11Sep 29, 2023Updated 2 years ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆275Updated this week
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆270Feb 2, 2026Updated last month
- Einsum-like high-level array sharding API for JAX☆34Jul 16, 2024Updated last year
- Turn jitted jax functions back into python source code☆23Dec 16, 2024Updated last year
- study of cutlass☆22Nov 10, 2024Updated last year
- TPU inference for vLLM, with unified JAX and PyTorch support.☆247Updated this week
- Distributed pretraining of large language models (LLMs) on cloud TPU slices, with Jax and Equinox.☆24Sep 29, 2024Updated last year
- Convert StableHLO models into Apple Core ML format☆22Feb 23, 2026Updated 2 weeks ago
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆51Jul 4, 2025Updated 8 months ago
- Tile-based language built for AI computation across all scales☆138Feb 27, 2026Updated last week
- A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.☆98Dec 17, 2025Updated 2 months ago
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 7 months ago
- Tidy autoregressive inference in JAX☆15Sep 1, 2025Updated 6 months ago
- ☆15May 11, 2025Updated 9 months ago
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆79Dec 18, 2025Updated 2 months ago
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 7 months ago
- 커버리스트 - 북 커버 생성 AI 서비스☆13Sep 11, 2022Updated 3 years ago
- Kernel Library Wheel for SGLang☆16Updated this week
- TPU에서 한국어용 LLM 추론을 위한 Jax/Flax 구현체입니다.☆12Jun 12, 2023Updated 2 years ago
- Frechet inception distance (FID) evaluation in JAX☆14May 28, 2024Updated last year
- ☕️ A vscode extension for netron, support *.pdmodel, *.nb, *.onnx, *.pb, *.h5, *.tflite, *.pth, *.pt, *.mnn, *.param, etc.☆14Jun 4, 2023Updated 2 years ago
- ☆76Updated this week
- Distributed Compiler based on Triton for Parallel Systems☆1,380Feb 13, 2026Updated 3 weeks ago
- Serving large language model with transformers☆13Oct 18, 2022Updated 3 years ago
- a Jax/Flax inference code of StarCoder☆12Jun 12, 2023Updated 2 years ago
- SKT'22 AI Fellowship, 딥러닝 기반 흑백 이미지 컬러화 기술 개발☆13Jun 7, 2023Updated 2 years ago
- FlashInfer: Kernel Library for LLM Serving☆5,101Updated this week
- Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, T…☆384Updated this week
- ☆97Mar 26, 2025Updated 11 months ago
- jax-triton contains integrations between JAX and OpenAI Triton☆439Feb 27, 2026Updated last week
- seqax = sequence modeling + JAX☆187Jul 23, 2025Updated 7 months ago
- slime is an LLM post-training framework for RL Scaling.☆4,536Mar 3, 2026Updated last week
- Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.☆58Aug 15, 2025Updated 6 months ago