LDLINGLINGLING/nano_vllm_note

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/LDLINGLINGLING/nano_vllm_note)

LDLINGLINGLING / nano_vllm_note

注释的nano_vllm仓库，并且完成了MiniCPM4的适配以及注册新模型的功能

☆201

Alternatives and similar repositories for nano_vllm_note

Users that are interested in nano_vllm_note are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

GeeeekExplorer / nano-vllm
View on GitHub
Nano vLLM
☆14,679Apr 26, 2026Updated 3 months ago
out-or-outstanding / tinyvllm
View on GitHub
复现 nanovllm并增加注释
☆15Jan 27, 2026Updated 6 months ago
Wenyueh / MinivLLM
View on GitHub
Based on Nano-vLLM, a simple replication of vLLM with self-contained paged attention and flash attention implementation
☆944Jul 22, 2026Updated last week
CalvinXKY / InfraTech
View on GitHub
分享AI Infra知识&代码练习：PyTorch、vLLM/SGLang、slime/vime框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等
☆3,218Updated this week
sgl-project / mini-sglang
View on GitHub
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
☆4,645May 17, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
gogongxt / nano-sglang
View on GitHub
☆161Mar 5, 2026Updated 4 months ago
smart-lty / nano-PEARL
View on GitHub
Draft-Target Disaggregation LLM Serving System via Parallel Speculative Decoding.
☆211Mar 18, 2026Updated 4 months ago
linzm1007 / nano-vllm-ascend
View on GitHub
nano-vllm是开源的一个gpu推理项目，基于开源版本弄的一个ascend npu版本推理小demo，旨在帮助初学者了解推理的整体流程，区别于vllm，nano-vllm体量更小，麻雀虽小五脏俱全，更有助于初学者学习。
☆120May 4, 2026Updated 2 months ago
xlite-dev / LeetCUDA
View on GitHub
LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.
☆11,662Updated this week
Tongkaio / CUDA_Kernel_Samples
View on GitHub
CUDA 算子手撕与面试指南
☆1,059Aug 23, 2025Updated 11 months ago
gogongxt / nano-vllm
View on GitHub
Nano vLLM
☆25Aug 11, 2025Updated 11 months ago
dhcode-cpp / DeepSeek-V4-mini
View on GitHub
DeepSeek-V4 Lecture
☆24Jun 5, 2026Updated last month
rchardx / hopper-gemm
View on GitHub
☆48Nov 1, 2025Updated 8 months ago
luliyucoordinate / cute-flash-attention
View on GitHub
Implement Flash Attention using Cute.
☆111Dec 17, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
HJCheng0602 / nanoPD
View on GitHub
A from-scratch Prefill/Decode disaggregation inference engine for LLMs
☆160May 10, 2026Updated 2 months ago
Infrasys-AI / AIInfra
View on GitHub
AIInfra（AI 基础设施）指AI系统从底层芯片等硬件，到上层软件栈支持AI大模型训练和推理。
☆7,758Dec 22, 2025Updated 7 months ago
zjhellofss / KuiperLLama
View on GitHub
校招、秋招、春招、实习好项目，带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
☆555Oct 28, 2025Updated 9 months ago
sonnyli / flash_attention_from_scratch
View on GitHub
Flash Attention from Scratch on CUDA Ampere
☆188Sep 1, 2025Updated 10 months ago
ForceInjection / cuda-code-skill
View on GitHub
将 NVIDIA PTX ISA 9.1、CUDA 13.1 (Runtime/Driver)、Math API 13.x、cuBLAS 13.2 及 NCCL 官方文档转换为易于检索的 Markdown 格式，并提供配套的 AI IDE 技能库（支持 Claude Cod…
☆20Updated this week
difey / nano-vllm-v1
View on GitHub
Nano vLLM v1 engine
☆16Aug 6, 2025Updated 11 months ago
wu-kan / wuk_cupti_wrapper
View on GitHub
a simple API to use CUPTI
☆10Aug 19, 2025Updated 11 months ago
CalvinXKY / BasicCUDA
View on GitHub
A tutorial for CUDA&PyTorch
☆481Mar 23, 2026Updated 4 months ago
ArthurinRUC / cutlass-notes
View on GitHub
From Minimal GEMM to Everything
☆230Jul 9, 2026Updated 2 weeks ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
keith2018 / TinyTorch
View on GitHub
A lightweight deep learning training framework implemented from scratch in C++, featuring a PyTorch-style API.
☆203Jun 23, 2026Updated last month
harleyszhang / lite_llama
View on GitHub
A light llama-like llm inference framework based on the triton kernel.
☆188Jan 5, 2026Updated 6 months ago
xgqdut2016 / hpc_project
View on GitHub
some hpc project for learning
☆28Aug 28, 2024Updated last year
cyhdmjzzy / DeepEP-Code-Analysis
View on GitHub
☆26Feb 27, 2026Updated 5 months ago
openmlir / mlir-tutorial
View on GitHub
Hands-On Practical MLIR Tutorial
☆60Aug 21, 2025Updated 11 months ago
DD-DuDa / Cute-Learning
View on GitHub
Examples of CUDA implementations by Cutlass CuTe
☆281Jul 1, 2025Updated last year
yuezhouhu / adaspec
View on GitHub
A selective knowledge distillation algorithm for efficient speculative decoders
☆39Nov 27, 2025Updated 8 months ago
shizhengLi / cuda-triton-learning
View on GitHub
CUDA & Triton Learning Project: Flash Attention 实现探索
☆37Aug 14, 2025Updated 11 months ago
cr7258 / ai-infra-learning
View on GitHub
This repository organizes materials, recordings, and schedules related to AI-infra learning meetings.
☆542Mar 1, 2026Updated 4 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
hengshan / Cuda-Tutorials
View on GitHub
CUDA 13.1 Tutorial Series for RTX 5090 (Blackwell) - Chinese teaching materials
☆29Jan 18, 2026Updated 6 months ago
sunkx109 / nano-vllm
View on GitHub
Nano vLLM
☆23Mar 28, 2026Updated 4 months ago
melonedo / algebraic-layouts
View on GitHub
☆23Aug 20, 2025Updated 11 months ago
dsl-learn / cuda-magic
View on GitHub
fake CUTLASS to get peformance
☆26Apr 28, 2026Updated 3 months ago
skyzh / tiny-llm
View on GitHub
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
☆4,425Updated this week
kvcache-ai / Mooncake
View on GitHub
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
☆6,083Updated this week
zhaochenyang20 / Awesome-ML-SYS-Tutorial
View on GitHub
My learning notes for ML SYS.
☆6,789Updated this week