sgl-project / sglang-jaxLinks
JAX backend for SGL
☆211Updated this week
Alternatives and similar repositories for sglang-jax
Users that are interested in sglang-jax are comparing it to the libraries listed below
Sorting:
- Accelerating MoE with IO and Tile-aware Optimizations☆522Updated this week
- Collection of kernels written in Triton language☆174Updated 9 months ago
- Allow torch tensor memory to be released and resumed later☆196Updated last month
- Cataloging released Triton kernels.☆282Updated 4 months ago
- ring-attention experiments☆161Updated last year
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆256Updated last month
- Perplexity GPU Kernels☆548Updated 2 months ago
- ☆269Updated last week
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆250Updated 3 weeks ago
- torchcomms: a modern PyTorch communications API☆319Updated this week
- extensible collectives library in triton☆91Updated 9 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆308Updated this week
- kernels, of the mega variety☆640Updated 3 months ago
- Autonomous GPU Kernel Generation via Deep Agents☆202Updated this week
- Applied AI experiments and examples for PyTorch☆312Updated 4 months ago
- ☆686Updated this week
- Ship correct and fast LLM kernels to PyTorch☆127Updated 3 weeks ago
- Fast low-bit matmul kernels in Triton☆416Updated 3 weeks ago
- Triton-based implementation of Sparse Mixture of Experts.☆259Updated 3 months ago
- ☆100Updated last year
- A lightweight design for computation-communication overlap.☆207Updated 2 weeks ago
- Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocatio…☆73Updated 3 months ago
- a minimal cache manager for PagedAttention, on top of llama3.☆130Updated last year
- Helpful kernel tutorials and examples for tile-based GPU programming☆526Updated this week
- ☆44Updated 9 months ago
- A bunch of kernels that might make stuff slower 😉☆73Updated this week
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆433Updated last week
- Distributed MoE in a Single Kernel [NeurIPS '25]☆174Updated this week
- PyTorch bindings for CUTLASS grouped GEMM.☆139Updated 7 months ago
- ☆153Updated last year