DeepAuto-AI/sglang

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/DeepAuto-AI/sglang)

DeepAuto-AI / sglang

This is a fork of SGLang for hip-attention integration. Please refer to hip-attention for detail.

☆18

Alternatives and similar repositories for sglang

Users that are interested in sglang are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

DeepAuto-AI / hip-attention
View on GitHub
Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.
☆152Mar 31, 2026Updated 3 months ago
daniel-geon-park / triton_bwd
View on GitHub
Automatic differentiation for Triton Kernels
☆29Aug 12, 2025Updated 11 months ago
Zyphra / tree_attention
View on GitHub
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆135Dec 3, 2024Updated last year
Snowflake-Labs / vllm
View on GitHub
☆16Nov 24, 2025Updated 7 months ago
RhinoDevel / mt_llm
View on GitHub
Pure C wrapper library to use llama.cpp with Linux and Windows as simple as possible.
☆15Jul 11, 2026Updated last week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
controlecidadao / samantha_ia
View on GitHub
Experimental interface environment for open source LLM, designed to democratize the use of AI. Powered by llama-cpp, llama-cpp-python and…
☆18Oct 11, 2025Updated 9 months ago
mohamedfawzy96 / ragxo
View on GitHub
☆13Feb 17, 2025Updated last year
D2CampusFest / 5th
View on GitHub
5th D2 CAMPUS FEST
☆17Oct 29, 2018Updated 7 years ago
Zepan / llama.cpp
View on GitHub
Port of Facebook's LLaMA model in C/C++
☆13Mar 19, 2023Updated 3 years ago
FarFetchd / clickitongue
View on GitHub
Mic-controlled mouse clicks
☆17Oct 6, 2025Updated 9 months ago
Ribosome-Packet-Processor / Ribosome
View on GitHub
High-Speed Stateful Packet Processor for Programmable Switches
☆13Dec 18, 2022Updated 3 years ago
suncloudsmoon / quizzer
View on GitHub
Generate Duolingo-style quiz courses from PDFs with spaced repetition, adaptive difficulty, and tutor chat.
☆16Apr 6, 2026Updated 3 months ago
NanXiao / ump
View on GitHub
A universal thread-safe memory pool.
☆26Jul 20, 2018Updated 8 years ago
Dongyeongkim / SWANN-Bindsnet
View on GitHub
implementing Weight Agnostic Neural Networks to Spiking Neural Networks
☆10Jan 26, 2021Updated 5 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
nzdorovylo / Milk-V-Duo--USB-Internet
View on GitHub
Milk-V Duo. Access to Internet throw USB RNDIS connection to host machine
☆16Jan 11, 2024Updated 2 years ago
Rivas-AI / HalluDetect
View on GitHub
Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach. This repository includes the implementation of…
☆17Jun 1, 2024Updated 2 years ago
gmlwns2000 / sea-attention
View on GitHub
Official Implementation of SEA: Sparse Linear Attention with Estimated Attention Mask (ICLR 2024)
☆12Jun 20, 2025Updated last year
UniversalDependencies / UD_Korean-GSD
View on GitHub
Korean UD Treebank.
☆23May 6, 2026Updated 2 months ago
noosed / NTTuner
View on GitHub
GUI tool to QLoRA/LoRA-fine-tune LLMs and deploy to Ollama. Broad GPU support (NVIDIA/AMD/Intel/Apple) + CPU fallback.
☆15Feb 18, 2026Updated 5 months ago
BlinkDL / fast.c
View on GitHub
Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.
☆73Feb 2, 2025Updated last year
BenjaminAster / WebGPU-Mandelbrot
View on GitHub
A GPU accelerated Mandelbrot viewer made using the new WebGPU API.
☆10Oct 26, 2023Updated 2 years ago
JustinXinLiu / FullScreenTitleBarRepo
View on GitHub
A simple behavior that can be attached to a Page to display a custom TitleBar with a Full Screen Mode toggle. UWP only.
☆12Aug 5, 2015Updated 10 years ago
thawk / dotspacevim
View on GitHub
My SpaceVim configuration. Clone it into ~/.SpaceVim.d
☆10Jan 18, 2026Updated 6 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
5uck1ess / cicero
View on GitHub
Self-hosted voice for coding agents. Talk from any browser or a Telegram call, interrupt mid-sentence, clone any voice, and hand real wor…
☆17Updated this week
memerememe / GoBooDo-Linux
View on GitHub
A Google book downloader with proxy support. Includes fixes for issues mentioned in the issues page of the original GooBooDoo page + cust…
☆25Jan 16, 2025Updated last year
m96-chan / 0xBitNet
View on GitHub
Run BitNet b1.58 ternary LLMs with WebGPU — in browsers and native apps
☆20Mar 8, 2026Updated 4 months ago
utilForever / rust-for-cpp
View on GitHub
A tutorial and example of Rust for C++ programmers
☆17Sep 21, 2021Updated 4 years ago
rayleizhu / vllm-ra
View on GitHub
[ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts
☆39Feb 29, 2024Updated 2 years ago
hellangleZ / GPU_capacity_plan_caculator
View on GitHub
☆21Nov 22, 2025Updated 7 months ago
nanowell / Q-Sparse-LLM
View on GitHub
My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
☆37Aug 14, 2024Updated last year
LennardF1989 / BG3-BagsOfSorting
View on GitHub
A command-line and GUI tool for Baldur's Gate 3 to generate custom inventory bags, modify treasure tables and search for items without ex…
☆14Sep 29, 2023Updated 2 years ago
seanie12 / SWEP
View on GitHub
[ACL 2021] Learning to Perturb Word Embeddings for Out-of-distribution QA
☆16May 11, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ByteDance-Seed / FlexPrefill
View on GitHub
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆170Oct 13, 2025Updated 9 months ago
Unmortan-Ellary / Vascura-FRONT
View on GitHub
Bloat Free, Portable and Lightweight LLM Frontend (Single HTML file). With Lorebook, Web Search, Macro Engine etc.
☆22Updated this week
seasonjs / rwkv
View on GitHub
pure go for rwkv
☆18Dec 31, 2023Updated 2 years ago
thunlp / InfLLM
View on GitHub
The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Mem…
☆405Apr 20, 2024Updated 2 years ago
XiaoConstantine / rlm-go
View on GitHub
Go implementation of Recursive Language Models (RLM) - inference-time scaling for arbitrarily long contexts
☆18May 12, 2026Updated 2 months ago
BowenforGit / GPU-Joins-Evaluation
View on GitHub
Evaluate state-of-the-art GPU joins
☆14Nov 29, 2023Updated 2 years ago
kaistAI / GAP
View on GitHub
[ACL 2023] Gradient Ascent Post-training Enhances Language Model Generalization
☆29Sep 12, 2024Updated last year