A single-file educational implementation for understanding vLLM's core concepts and running LLM inference.
☆43Apr 7, 2026Updated 3 weeks ago
Alternatives and similar repositories for cleanvllm
Users that are interested in cleanvllm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…☆239Jan 14, 2026Updated 3 months ago
- ☆23Jan 21, 2026Updated 3 months ago
- Implement some method of LLM KV Cache Sparsity☆40Jun 6, 2024Updated last year
- Code for "Pre-training with Contrastive Learning for Unified Log Analytics"☆21Jan 22, 2024Updated 2 years ago
- DiTAS: Quantizing Diffusion Transformers via Enhanced Activation Smoothing (WACV 2025)☆13Feb 7, 2026Updated 2 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆42Oct 11, 2025Updated 6 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆92Jul 17, 2025Updated 9 months ago
- Applying PBT optimization technique to different domains☆10Oct 16, 2019Updated 6 years ago
- A car re-identification app based on multi-feature fusion technique☆18Apr 24, 2022Updated 4 years ago
- Official codebase for Adaptive Online Planning for Continual Lifelong Learning.☆17Mar 26, 2020Updated 6 years ago
- A light llama-like llm inference framework based on the triton kernel.☆184Jan 5, 2026Updated 4 months ago
- MVE: model-based value estimation☆11Jul 30, 2018Updated 7 years ago
- ☆12Jun 11, 2021Updated 4 years ago
- ☆55Sep 18, 2025Updated 7 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆31Dec 21, 2024Updated last year
- Self implementation of course projects for Computer Architecture 2022 Spring☆11Sep 17, 2022Updated 3 years ago
- DRAM/SSD hybrid caching system☆15Mar 13, 2025Updated last year
- ☆12Jun 17, 2022Updated 3 years ago
- ☆27Feb 27, 2025Updated last year
- ☆15Jun 14, 2022Updated 3 years ago
- LLM training parallelisms (DP, FSDP, TP, PP) in pure C☆28Jan 27, 2026Updated 3 months ago
- ☆11Mar 26, 2024Updated 2 years ago
- A record of reading list on some MLsys popular topic☆24Mar 20, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Cowic is a C++ library to compress formatted log like Apache access log.☆11May 3, 2015Updated 11 years ago
- RISC-V SingleCycle/Pipeline CPU (lab of ZJU Computer System Series)☆16Jul 6, 2023Updated 2 years ago
- 清华大学电子工程系数字逻辑与处理器基础实验大作业——流水线 CPU☆12Aug 8, 2021Updated 4 years ago
- Fuzzing compression libraries☆20Jan 10, 2016Updated 10 years ago
- Rust development environment based on Docker.☆12Sep 7, 2021Updated 4 years ago
- ☆17Apr 15, 2025Updated last year
- compare WebAssembly build size depends on imported package.☆12Dec 11, 2018Updated 7 years ago
- A universal C++ compression library based on wavelet transformation☆12Jun 14, 2024Updated last year
- 基于RISC_V32I指令集架构的五级流水CPU☆15Sep 30, 2019Updated 6 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- 算法实现图片的有损压缩和无损压缩以及解压缩☆11Nov 27, 2016Updated 9 years ago
- Optimize softmax in triton in many cases☆24Sep 6, 2024Updated last year
- DeepSZ: A Novel Framework to Compress Deep Neural Networks by Using Error-Bounded Lossy Compression☆11Oct 7, 2020Updated 5 years ago
- EWAH Compressed Bitmaps☆25Sep 25, 2020Updated 5 years ago
- Some Compression streams (gzip, snappy, lz4) implementing the ZeroCopy Interface from Google(TM) protobuf 2.4.1☆19Apr 30, 2013Updated 13 years ago
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆34Nov 29, 2024Updated last year
- FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA.☆276Apr 29, 2026Updated last week