☆157Mar 5, 2026Updated 3 months ago
Alternatives and similar repositories for nano-sglang
Users that are interested in nano-sglang are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Cute layout visualization☆41Jan 18, 2026Updated 5 months ago
- A light llama-like llm inference framework based on the triton kernel.☆188Jan 5, 2026Updated 5 months ago
- A Triton-only attention backend for vLLM☆26Mar 17, 2026Updated 3 months ago
- Agentic Kernel Optimization for All — automated GPU kernel optimization for any kernel, any hardware, any language☆309May 31, 2026Updated 3 weeks ago
- LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model☆118Apr 28, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 基于昇腾310芯片的大语言模型部署☆28Jun 14, 2024Updated 2 years ago
- A simple calculation for LLM MFU.☆78Sep 10, 2025Updated 9 months ago
- 🎓Automatically Update circult-eda-mlsys-tinyml Papers Daily using Github Actions (Update Every 8th hours)☆10Updated this week
- 算子库☆17Jul 9, 2025Updated 11 months ago
- Draft-Target Disaggregation LLM Serving System via Parallel Speculative Decoding.☆208Mar 18, 2026Updated 3 months ago
- resizable hashing strategy for large-scale storage☆25Oct 6, 2019Updated 6 years ago
- Getting Started with Triton: A Tutorial for Python Beginners☆60Mar 26, 2026Updated 3 months ago
- libsmctrl论文的复现,添加了python端接口,可以在python端灵活调用接口来分配计算资源☆12May 21, 2024Updated 2 years ago
- a size profiler for cuda binary☆69Jan 15, 2026Updated 5 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ICLR25] STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs☆20Jun 3, 2025Updated last year
- CUDA SGEMM optimization note☆15Oct 31, 2023Updated 2 years ago
- TACOS: [T]opology-[A]ware [Co]llective Algorithm [S]ynthesizer for Distributed Machine Learning☆36Jun 13, 2025Updated last year
- ☆19Nov 10, 2024Updated last year
- ☆13Aug 1, 2023Updated 2 years ago
- ☆25Jun 11, 2025Updated last year
- This project uses LSTM and Convolutional time series models to predict and forecast Google and Alibaba cluster traces☆10Dec 4, 2020Updated 5 years ago
- Contains the code of estimating cloud data center workload with ARIMA, SARIMA, LSTM, RNN.☆12Nov 1, 2021Updated 4 years ago
- Vortex: Programmable Sparse Attention for Agents as Algorithm Designers☆62Jun 8, 2026Updated 3 weeks ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆39Updated this week
- 《深度学习入门2-自制框架》Building Deep Learning Framework☆45May 22, 2024Updated 2 years ago
- An experimental communicating attention kernel based on DeepEP.☆34Jul 29, 2025Updated 11 months ago
- JAX backend for SGL☆288Jun 22, 2026Updated last week
- Unofficial PyTorch reproduction of DeepSeek's Thinking with Visual Primitives.☆140Updated this week
- ☆31Jan 7, 2025Updated last year
- Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration☆51Jan 8, 2026Updated 5 months ago
- TernGEMM: General Matrix Multiply Library with Ternary Weights for Fast DNN Inference☆14Feb 22, 2022Updated 4 years ago
- ☆25Aug 27, 2021Updated 4 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- my solution for UC Berkeley AI projects pacman☆11Jul 25, 2020Updated 5 years ago
- 学习笔记☆12Mar 7, 2026Updated 3 months ago
- Benchmark code for the "Online normalizer calculation for softmax" paper☆110Jul 27, 2018Updated 7 years ago
- all kind of notes, I maybe sort this in the future☆13Aug 29, 2025Updated 10 months ago
- The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions (EMNLP 2023))☆13Dec 21, 2023Updated 2 years ago
- 《Clang Compiler Frontend》的非专业个人翻译☆39Aug 10, 2024Updated last year
- Data processing of OpenSky COVID-19 Flight Dataset✈️☆15Apr 6, 2024Updated 2 years ago