☆144Mar 5, 2026Updated 2 months ago
Alternatives and similar repositories for nano-sglang
Users that are interested in nano-sglang are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 跟我一起写 Makefile (Markdown重制版 )☆15May 29, 2024Updated last year
- Agentic Kernel Optimization for All — automated GPU kernel optimization for any kernel, any hardware, any language☆150Apr 2, 2026Updated last month
- Cute layout visualization☆38Jan 18, 2026Updated 3 months ago
- A light llama-like llm inference framework based on the triton kernel.☆185Jan 5, 2026Updated 4 months ago
- A Triton-only attention backend for vLLM☆25Mar 17, 2026Updated last month
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model☆114Apr 28, 2026Updated last week
- 基于昇腾310芯片的大语言模型部署☆25Jun 14, 2024Updated last year
- A simple calculation for LLM MFU.☆77Sep 10, 2025Updated 8 months ago
- 🎓Automatically Update circult-eda-mlsys-tinyml Papers Daily using Github Actions (Update Every 8th hours)☆10Updated this week
- Draft-Target Disaggregation LLM Serving System via Parallel Speculative Decoding.☆201Mar 18, 2026Updated last month
- SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models☆259Updated this week
- Getting Started with Triton: A Tutorial for Python Beginners☆54Mar 26, 2026Updated last month
- libsmctrl论文的复现,添加了python端接口,可以在python端灵活调用接口来分配计算资源☆12May 21, 2024Updated last year
- a size profiler for cuda binary☆70Jan 15, 2026Updated 3 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- [ICLR25] STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs☆20Jun 3, 2025Updated 11 months ago
- 高性能短序列稀疏Mask Attention CUDA算子,针对<1K序列+75%稀疏度优化☆75Mar 18, 2026Updated last month
- CUDA SGEMM optimization note☆15Oct 31, 2023Updated 2 years ago
- TACOS: [T]opology-[A]ware [Co]llective Algorithm [S]ynthesizer for Distributed Machine Learning☆34Jun 13, 2025Updated 10 months ago
- ☆19Nov 10, 2024Updated last year
- ☆89Feb 10, 2026Updated 3 months ago
- ☆24Jun 11, 2025Updated 11 months ago
- This project uses LSTM and Convolutional time series models to predict and forecast Google and Alibaba cluster traces☆10Dec 4, 2020Updated 5 years ago
- Vortex: A Flexible and Efficient Sparse Attention Framework☆53Apr 30, 2026Updated last week
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆79Apr 29, 2026Updated last week
- An experimental communicating attention kernel based on DeepEP.☆34Jul 29, 2025Updated 9 months ago
- 使用VC检测车道线(曲线)☆10Apr 23, 2018Updated 8 years ago
- JAX backend for SGL☆269Updated this week
- ☆44May 2, 2026Updated last week
- ☆31Jan 7, 2025Updated last year
- Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration☆46Jan 8, 2026Updated 4 months ago
- TernGEMM: General Matrix Multiply Library with Ternary Weights for Fast DNN Inference☆14Feb 22, 2022Updated 4 years ago
- ☆25Aug 27, 2021Updated 4 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- my solution for UC Berkeley AI projects pacman☆11Jul 25, 2020Updated 5 years ago
- Benchmark code for the "Online normalizer calculation for softmax" paper☆110Jul 27, 2018Updated 7 years ago
- all kind of notes, I maybe sort this in the future☆13Aug 29, 2025Updated 8 months ago
- Nano vLLM☆13Jun 26, 2025Updated 10 months ago
- Data processing of OpenSky COVID-19 Flight Dataset✈️☆16Apr 6, 2024Updated 2 years ago
- A CUDA kernel optimization toolkit for validation, benchmarking, Nsight Compute profiling, bottleneck analysis, and iterative tuning. It …☆152Apr 22, 2026Updated 2 weeks ago
- ☆44Jan 8, 2025Updated last year