☆123Mar 5, 2026Updated this week
Alternatives and similar repositories for nano-sglang
Users that are interested in nano-sglang are comparing it to the libraries listed below
Sorting:
- LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model☆76Mar 2, 2026Updated last week
- A simple calculation for LLM MFU.☆69Sep 10, 2025Updated 5 months ago
- Cute layout visualization☆31Jan 18, 2026Updated last month
- A Triton-only attention backend for vLLM☆24Feb 11, 2026Updated 3 weeks ago
- A light llama-like llm inference framework based on the triton kernel.☆173Jan 5, 2026Updated 2 months ago
- ☆27Jan 7, 2025Updated last year
- Vortex: A Flexible and Efficient Sparse Attention Framework☆48Jan 21, 2026Updated last month
- DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang☆44Nov 19, 2025Updated 3 months ago
- 算子库☆17Jul 9, 2025Updated 8 months ago
- Getting Started with Triton: A Tutorial for Python Beginners☆37Oct 21, 2025Updated 4 months ago
- ☆23Jun 11, 2025Updated 8 months ago
- 基于昇腾310芯片的大语言模型部署☆24Jun 14, 2024Updated last year
- ☆155Mar 4, 2025Updated last year
- Benchmark code for the "Online normalizer calculation for softmax" paper☆108Jul 27, 2018Updated 7 years ago
- ☆21Apr 17, 2025Updated 10 months ago
- Canvas: End-to-End Kernel Architecture Search in Neural Networks☆27Nov 18, 2024Updated last year
- TACOS: [T]opology-[A]ware [Co]llective Algorithm [S]ynthesizer for Distributed Machine Learning☆32Jun 13, 2025Updated 8 months ago
- Draft-Target Disaggregation LLM Serving System via Parallel Speculative Decoding.☆173Updated this week
- ☆12Mar 21, 2024Updated last year
- Standalone Flash Attention v2 kernel without libtorch dependency☆114Sep 10, 2024Updated last year
- ☆34Feb 3, 2025Updated last year
- ☆25Aug 27, 2021Updated 4 years ago
- LMCache on Ascend☆51Updated this week
- [Archived] For the latest updates and community contribution, please visit: https://github.com/Ascend/TransferQueue or https://gitcode.co…☆13Jan 16, 2026Updated last month
- ☆53Feb 24, 2026Updated last week
- 🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.☆255Feb 13, 2026Updated 3 weeks ago
- Luthier, a GPU binary instrumentation tool for AMD GPUs☆27Updated this week
- a size profiler for cuda binary☆72Jan 15, 2026Updated last month
- ☆141Apr 23, 2024Updated last year
- ☆83Feb 10, 2026Updated last month
- 《Clang Compiler Frontend》的非专业个人翻译☆38Aug 10, 2024Updated last year
- Framework to reduce autotune overhead to zero for well known deployments.☆97Sep 19, 2025Updated 5 months ago
- ☆152Jan 9, 2025Updated last year
- ☆33Dec 10, 2025Updated 2 months ago
- 本项目是基于Unity编写的打鸭子游戏,包含打鸭子核心玩法实现,资源管理,移动端摇杆控制模块等元素。☆11Apr 13, 2022Updated 3 years ago
- ☆13Jan 16, 2026Updated last month
- This project is based on the [LTX-Video](https://github.com/Lightricks/LTX-Video) algorithm of the diffusers and optimized and accelerate…☆13Dec 31, 2024Updated last year
- 2019年“OPPO TOP高校创新科技大赛”的参赛项目——“盲人眼镜”,基于“raspberry-web-app”三端交互策略实现盲人听书、导航、聊天等功能☆10Feb 20, 2022Updated 4 years ago
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆51Jul 4, 2025Updated 8 months ago