amulil/cleanvllm

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/amulil/cleanvllm)

amulil / cleanvllm

A single-file educational implementation for understanding vLLM's core concepts and running LLM inference.

☆45

Alternatives and similar repositories for cleanvllm

Users that are interested in cleanvllm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

gogongxt / nano-vllm
View on GitHub
Nano vLLM
☆25Aug 11, 2025Updated 11 months ago
KJLdefeated / RL.cu
View on GitHub
RLVR training for LLM in CUDA/C++
☆42Updated this week
EugenHotaj / llm_parallelisms.c
View on GitHub
LLM training parallelisms (DP, FSDP, TP, PP) in pure C
☆29Jan 27, 2026Updated 6 months ago
HarryWu99 / llm_kvcache_sparsity
View on GitHub
Implement some method of LLM KV Cache Sparsity
☆41Jun 6, 2024Updated 2 years ago
harleyszhang / lite_llama
View on GitHub
A light llama-like llm inference framework based on the triton kernel.
☆188Jan 5, 2026Updated 6 months ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
wejoncy / sfllm
View on GitHub
Super fast serving stack for LLM on Windows/Linux/Macos
☆17Dec 17, 2025Updated 7 months ago
smart-lty / nano-PEARL
View on GitHub
Draft-Target Disaggregation LLM Serving System via Parallel Speculative Decoding.
☆211Mar 18, 2026Updated 4 months ago
TheToughCrane / nano-kvllm
View on GitHub
This project aims to provide a high effective KV cache manage framework for llm inference and improve memory utilization and inference sp…
☆69Apr 24, 2026Updated 3 months ago
AlibabaPAI / FLASHNN
View on GitHub
☆106Sep 9, 2024Updated last year
ModelTC / L2_Compression
View on GitHub
☆13Jun 16, 2024Updated 2 years ago
mlsysAE2022 / ae_mlsys_gnn
View on GitHub
☆11Mar 9, 2022Updated 4 years ago
stonet-research / cheops25-IO-characterization-of-LLM-model-kv-cache-offloading-nvme
View on GitHub
☆19Apr 15, 2025Updated last year
facebookresearch / taser-tgnn
View on GitHub
[IPDPS 2024] Adaptive neighbor sampling for temporal GNN
☆16Feb 17, 2025Updated last year
OpenBMB / CPM.cu
View on GitHub
CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…
☆241Jan 14, 2026Updated 6 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
changjonathanc / flex-nano-vllm
View on GitHub
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
☆356Nov 2, 2025Updated 8 months ago
HuXia7157 / garbage-classification-system
View on GitHub
用pytorch训练18层残差神经网络，用pyqt设计界面
☆12Jun 23, 2020Updated 6 years ago
sahandrez / homomorphic_policy_gradient
View on GitHub
Author's PyTorch Implementation of Deep Homomorphic Policy Gradient (DHPG) - NeurIPS 2022 and JMLR 2024
☆24Apr 8, 2024Updated 2 years ago
qgallouedec / deep_rl
View on GitHub
Single-file truly minimal implementation of state-of-the-art reinforcement learning algorithms.
☆21Feb 13, 2023Updated 3 years ago
zhaochenyang20 / ModelServer
View on GitHub
Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang
☆62Nov 8, 2024Updated last year
smart-lty / ParallelSpeculativeDecoding
View on GitHub
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆170Dec 23, 2025Updated 7 months ago
zju-jiyicheng / LVSpec
View on GitHub
[ACL 2026 Main] See the Forest for the Trees: Loosely Speculative Decoding via Visual-Semantic Guidance for Efficient Inference of Video …
☆27Jul 4, 2026Updated 3 weeks ago
ParCIS / FlashSparse
View on GitHub
FlashSparse significantly reduces the computation redundancy for unstructured sparsity (for SpMM and SDDMM) on Tensor Cores through a Swa…
☆39Oct 5, 2025Updated 9 months ago
candlezang / up-embedded-QT
View on GitHub
尚观嵌入式课程QT图形界面课程
☆11Dec 26, 2016Updated 9 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
difey / nano-vllm-v1
View on GitHub
Nano vLLM v1 engine
☆16Aug 6, 2025Updated 11 months ago
DZY122 / DiTAS
View on GitHub
DiTAS: Quantizing Diffusion Transformers via Enhanced Activation Smoothing (WACV 2025)
☆13Feb 7, 2026Updated 5 months ago
PiotrNawrot / nano-sparse-attention
View on GitHub
The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.
☆92Jul 17, 2025Updated last year
dhruvramani / gym-render-browser
View on GitHub
Render RL environments on a web browser with just one extra line of code.
☆20Sep 10, 2020Updated 5 years ago
kyegomez / FastFF
View on GitHub
Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"
☆16Nov 11, 2024Updated last year
adsl-rg / adsl-rg.github.io
View on GitHub
☆14Jul 22, 2026Updated last week
adarshxs / TokenTally
View on GitHub
Estimate Your LLM's Token Toll Across Various Platforms and Configurations
☆39Nov 9, 2025Updated 8 months ago
Gumpest / MasKD
View on GitHub
Official implementation of paper "Masked Distillation with Receptive Tokens", ICLR 2023.
☆10Mar 13, 2023Updated 3 years ago
annosubmission / GRC-Cache
View on GitHub
☆16Mar 13, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ovg-project / GVM
View on GitHub
☆23Jan 18, 2026Updated 6 months ago
Serkonosand / ParallelProgramming2021
View on GitHub
中科大郑启龙2021年并行程序设计课程实验
☆11Jan 15, 2022Updated 4 years ago
liblaf / ilatex
View on GitHub
📚 LaTeX templates and tools for creating beautiful, structured documents 📝
☆14Oct 24, 2025Updated 9 months ago
sonnyli / flash_attention_from_scratch
View on GitHub
Flash Attention from Scratch on CUDA Ampere
☆187Sep 1, 2025Updated 10 months ago
zhaoyutim / DeepSeek-PM
View on GitHub
AI Hedge Fund Repo integrate with DeepSeek V3 and R1 hosted on SiliconFlow.
☆12Feb 3, 2025Updated last year
wangzhaode / onnx-llm
View on GitHub
llm deploy project based onnx.
☆49Oct 9, 2024Updated last year
elsheikh21 / population-based-training-of-NNs
View on GitHub
Applying PBT optimization technique to different domains
☆10Oct 16, 2019Updated 6 years ago