This is a fork of SGLang for hip-attention integration. Please refer to hip-attention for detail.
☆18Dec 23, 2025Updated 2 months ago
Alternatives and similar repositories for sglang
Users that are interested in sglang are comparing it to the libraries listed below
Sorting:
- Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.☆150Nov 3, 2025Updated 4 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆133Dec 3, 2024Updated last year
- ☆16Nov 24, 2025Updated 3 months ago
- ☆13Feb 17, 2025Updated last year
- Port of Facebook's LLaMA model in C/C++☆13Mar 19, 2023Updated 3 years ago
- High-Speed Stateful Packet Processor for Programmable Switches☆14Dec 18, 2022Updated 3 years ago
- Milk-V Duo. Access to Internet throw USB RNDIS connection to host machine☆16Jan 11, 2024Updated 2 years ago
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆73Feb 2, 2025Updated last year
- A simple tool for managing sets of environment variables☆16Dec 25, 2025Updated 2 months ago
- Korean UD Treebank.☆22Nov 12, 2025Updated 4 months ago
- pure go for rwkv☆19Dec 31, 2023Updated 2 years ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆40Feb 29, 2024Updated 2 years ago
- My SpaceVim configuration. Clone it into ~/.SpaceVim.d☆10Jan 18, 2026Updated 2 months ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆34Aug 14, 2024Updated last year
- A simple behavior that can be attached to a Page to display a custom TitleBar with a Full Screen Mode toggle. UWP only.☆12Aug 5, 2015Updated 10 years ago
- ☆11Dec 11, 2024Updated last year
- ROSA+: RWKV's ROSA implementation with fallback statistical predictor☆35Oct 13, 2025Updated 5 months ago
- A tutorial and example of Rust for C++ programmers☆17Sep 21, 2021Updated 4 years ago
- 基于Funasr的[实时]AI语音助手☆24Dec 18, 2025Updated 3 months ago
- ☆10Nov 21, 2023Updated 2 years ago
- How to plot for papers, slides, demos, etc.☆10Apr 7, 2022Updated 3 years ago
- The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Mem…☆396Apr 20, 2024Updated last year
- [ACL 2021] Learning to Perturb Word Embeddings for Out-of-distribution QA☆16May 11, 2022Updated 3 years ago
- Code for paper: Long cOntext aliGnment via efficient preference Optimization☆24Oct 10, 2025Updated 5 months ago
- The driver for LMCache core to run in vLLM☆62Feb 4, 2025Updated last year
- An Autonomous Curriculum Reinforcement Learning framework that steers agents to continually learn in specific environments with zero huma…☆24Feb 25, 2026Updated 3 weeks ago
- A simple extension that uses Bark Text-to-Speech for audio output☆33Nov 3, 2023Updated 2 years ago
- Support for language highlighting of KECC(KAIST Educational C Compiler) IR☆12May 17, 2022Updated 3 years ago
- Upscale Twitch stream and restream into Twitch or RTMP or File.☆16Sep 16, 2023Updated 2 years ago
- ☆106Dec 5, 2025Updated 3 months ago
- ☆23Sep 29, 2024Updated last year
- Fast and memory-efficient exact attention☆20Mar 13, 2026Updated last week
- ☆16Nov 26, 2024Updated last year
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference☆164Oct 13, 2025Updated 5 months ago
- cursor logs with gpt-4o using litellm proxy☆14Sep 9, 2025Updated 6 months ago
- An optimized Merkle Patricia Trie implementation on GPU, fully compatible with and integrable into Ethereum. The paper is published on VL…☆14Apr 15, 2024Updated last year
- ☆10May 15, 2024Updated last year
- python utils☆12Jan 7, 2020Updated 6 years ago
- A code-generating database system with incorporated versioning commands in SQL.☆13Jan 18, 2021Updated 5 years ago