sgl-project/whl

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sgl-project/whl)

sgl-project / whl

SGLang Kernel Wheel Index

☆24

Alternatives and similar repositories for whl

Users that are interested in whl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sgl-project / DeepGEMM
View on GitHub
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
☆32Updated this week
sgl-project / sgl-flash-attn
View on GitHub
Fast and memory-efficient exact attention
☆22Jun 26, 2026Updated 3 weeks ago
chengzeyi / piflux
View on GitHub
(WIP) Parallel inference for black-forest-labs' FLUX model.
☆19Nov 18, 2024Updated last year
BBuf / tensorrt-llm-moe
View on GitHub
☆34Feb 3, 2025Updated last year
WaveSpeedAI / QuantumAttention
View on GitHub
[WIP] Better (FP8) attention for Hopper
☆33Feb 24, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
IST-DASLab / MicroAdam
View on GitHub
This repository contains code for the MicroAdam paper.
☆21Dec 14, 2024Updated last year
nicolaswilde / amx-gemm-handwritten
View on GitHub
Handwritten GEMM using Intel AMX (Advanced Matrix Extension)
☆17Jan 11, 2025Updated last year
latentCall145 / channels-last-groupnorm
View on GitHub
A CUDA kernel for NHWC GroupNorm for PyTorch
☆23Nov 15, 2024Updated last year
flashinfer-ai / cubloaty
View on GitHub
a size profiler for cuda binary
☆71Jan 15, 2026Updated 6 months ago
feifeibear / ChituAttention
View on GitHub
Quantized Attention on GPU
☆45Nov 22, 2024Updated last year
Bruce-Lee-LY / cutlass_gemm
View on GitHub
Multiple GEMM operators are constructed with cutlass to support LLM inference.
☆20Aug 3, 2025Updated 11 months ago
sgl-project / sgl-kernel-xpu
View on GitHub
SGLang kernel library for Intel XPU
☆27Updated this week
VivekPanyam / cudaparsers
View on GitHub
Parsers for CUDA binary files
☆25Dec 29, 2023Updated 2 years ago
microsoft / FractalTensor
View on GitHub
FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …
☆32Dec 21, 2024Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
tile-ai / AttentionEngine
View on GitHub
☆52May 19, 2025Updated last year
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆19Nov 19, 2024Updated last year
Oneflow-Inc / dfccl
View on GitHub
☆27Feb 17, 2025Updated last year
GindaChen / FlexFlashAttention3
View on GitHub
FlexAttention w/ FlashAttention3 Support
☆27Oct 5, 2024Updated last year
drarijitdas / Natural-GaLore
View on GitHub
An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace
☆19Oct 21, 2024Updated last year
sgl-project / genai-bench
View on GitHub
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆314Updated this week
PRIME-RL / P1-VL
View on GitHub
P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads
☆15Feb 11, 2026Updated 5 months ago
TsinghuaC3I / ZEDA
View on GitHub
Post-Trained MoE Can Skip Half Experts via Self-Distillation
☆38May 19, 2026Updated 2 months ago
KuangjuX / cu-x
View on GitHub
🎉My Collections of CUDA Kernels~
☆11Jun 25, 2024Updated 2 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
gallen881 / Physics_Master
View on GitHub
Physics Master is a model fine-tuned from llama3-8B-Instruct. It can answer your physics question!
☆16Aug 24, 2024Updated last year
Zyphra / Zyda_processing
View on GitHub
☆44Jun 19, 2024Updated 2 years ago
Infrawaves / DeepEP_ibrc_dual-ports_multiQP
View on GitHub
Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport
☆75May 9, 2025Updated last year
flashinfer-ai / cutlass-viz
View on GitHub
☆65Apr 26, 2025Updated last year
feifeibear / DPSKV3MFU
View on GitHub
Estimate MFU for DeepSeekV3
☆26Jan 5, 2025Updated last year
Farseer-Scaling-Law / Farseer
View on GitHub
☆21Jun 12, 2025Updated last year
togethercomputer / flash-attention-3
View on GitHub
Fast and memory-efficient exact attention
☆34Dec 2, 2024Updated last year
Bruce-Lee-LY / decoding_attention
View on GitHub
Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
☆47Jun 11, 2025Updated last year
thomasjoshi / agents-never-forget
View on GitHub
☆18May 18, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Dao-AILab / gemm-cublas
View on GitHub
☆22May 5, 2025Updated last year
zhangir-azerbayev / MetaMath
View on GitHub
☆11Oct 11, 2023Updated 2 years ago
uynaes / RankingAwareCLIP
View on GitHub
[ICLR'25] Official repository of paper: Ranking-aware adapter for text-driven image ordering with CLIP
☆16Apr 17, 2025Updated last year
MoonshotAI / Kimi-Researcher
View on GitHub
☆80Jun 20, 2025Updated last year
mikex86 / tritonc
View on GitHub
Standalone commandline CLI tool for compiling Triton kernels
☆20Sep 13, 2024Updated last year
microsoft / TAMAS
View on GitHub
☆22Dec 15, 2025Updated 7 months ago
zzaoen / RdmaAcceleratingRedis
View on GitHub
☆21Jan 2, 2023Updated 3 years ago