kvcache-ai/vllm

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kvcache-ai/vllm)

kvcache-ai / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

☆16

Alternatives and similar repositories for vllm

Users that are interested in vllm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

SylviaZiyuZhang / CleANN
View on GitHub
Public repository for CleANN, an efficient fully-dynamic approximate nearest neighbor search index
☆16Sep 18, 2025Updated 10 months ago
4Catalyzer / cyclegan
View on GitHub
☆13Mar 29, 2019Updated 7 years ago
lukedodd / JitCalc
View on GitHub
Mathematical expression evaluator with just in time code generation.
☆12Apr 7, 2013Updated 13 years ago
cxl-micron-reskit / famfs-linux
View on GitHub
This repo hosts the famfs kernel patch sets as branches
☆11Mar 24, 2026Updated 4 months ago
CaucherWang / Steiner-hardness
View on GitHub
A new query hardness measure for graph-based ANN indexes. Build unbiased workloads with this hardness to see the actual performance of yo…
☆22May 6, 2026Updated 2 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
PASSIONLab / distributed_sddmm
View on GitHub
Distributed SDDMM Kernel
☆12Jul 8, 2022Updated 4 years ago
apanwariisc / Illuminator
View on GitHub
This is the implementation of our research system Illuminator that was published in ASPLOS 2018 with the title "Making Huge Pages Actuall…
☆11Sep 11, 2020Updated 5 years ago
thustorage / deft
View on GitHub
Deft: A Scalable Tree Index for Disaggregated Memory
☆22Apr 23, 2025Updated last year
hoterran / simple-timing-wheel
View on GitHub
timing wheel implementation
☆14Jul 25, 2012Updated 14 years ago
Luca-Dalmasso / matrixTransposeCUDA
View on GitHub
CUDA C simple application for Nvidia's GPU
☆11Jun 7, 2022Updated 4 years ago
PhantomThief / jedis-helper
View on GitHub
☆12Mar 31, 2021Updated 5 years ago
stonysystems / rolis
View on GitHub
Eurosys22' - Rolis: a software approach to efficiently replicating multi-core transactions
☆17Feb 28, 2024Updated 2 years ago
Mellanox / rdma_fc
View on GitHub
Demonstration of flow control over RDMA fabric
☆13Jun 28, 2018Updated 8 years ago
yiqiaowang / learning
View on GitHub
☆12Jun 3, 2019Updated 7 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Visual-Computing / DynamicExplorationGraph
View on GitHub
Repository related to the Dynamic Exploration Graph and its previous iterations.
☆28Updated this week
AIS-SNU / PathWeaver
View on GitHub
A High-Throughput Multi-GPU System for Graph-Based Approximate Nearest Neighbor Search
☆21Jul 22, 2025Updated last year
fabuzaid21 / yggdrasil
View on GitHub
Yggdrasil: Faster Decision Trees Using Column Partitioning in Spark
☆30May 17, 2018Updated 8 years ago
mti-lab / rnn-descent
View on GitHub
☆24Apr 4, 2024Updated 2 years ago
Scientific-Computing-Lab / STREAMer
View on GitHub
STREAMer: Benchmarking remote volatile and non-volatile memory bandwidth
☆18Aug 21, 2023Updated 2 years ago
JiangLiSJTU / token-ring
View on GitHub
☆13Jan 7, 2025Updated last year
fw-ai / llama-cuda-graph-example
View on GitHub
Example of applying CUDA graphs to LLaMA-v2
☆11Aug 25, 2023Updated 2 years ago
larsgottesbueren / gp-ann
View on GitHub
Experimental Code for "Unleashing Graph Partitioning for Large-Scale Nearest Neighbor Search"
☆30Nov 4, 2024Updated last year
howarlii / SAQ
View on GitHub
Segmented Code Adjustment Quantization (SAQ)
☆26Sep 22, 2025Updated 10 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
fpgasystems / Chameleon-RAG-Acceleration
View on GitHub
☆23Jun 1, 2025Updated last year
lmccccc / FANNBench
View on GitHub
Benchmark for filtered ANN search applications
☆25Apr 28, 2026Updated 2 months ago
naoyam / MemoryTracer-pintool
View on GitHub
☆17Aug 4, 2014Updated 11 years ago
haoxizhong / TUOJ
View on GitHub
Let's discover a new world. — Edit
☆10Jan 6, 2017Updated 9 years ago
TaoLv / mxProfileParser
View on GitHub
A simple tool for parsing the profile.json file of mxnet
☆14Aug 1, 2018Updated 7 years ago
tstamler / zIO
View on GitHub
Transparent zero-copy IO
☆26Mar 31, 2024Updated 2 years ago
NVIDIA / nvbench_demo
View on GitHub
Simple starter CMake project that uses NVBench.
☆15May 6, 2025Updated last year
jychen21 / Habana-LLM-Viewer
View on GitHub
☆13Jul 24, 2024Updated 2 years ago
coolceph / bhook
View on GitHub
Baidu Hook
☆13Jan 7, 2016Updated 10 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
DolphinICS / cuda-rdma-bench
View on GitHub
NVIDIA GPU direct RDMA using SISCI API
☆18Apr 8, 2018Updated 8 years ago
TKONIY / tutorial-any-repo
View on GitHub
Claude Code skill: Generate file-by-file code tutorial websites for any repository with parallel agent teams
☆28Mar 13, 2026Updated 4 months ago
pisa-engine / BMP
View on GitHub
Faster Learned Sparse Retrieval with Block-Max Pruning. ACM SIGIR 2024.
☆37Jan 14, 2026Updated 6 months ago
jhson989 / cuda-ptx
View on GitHub
Inline PTX Assembly in CUDA example
☆15May 7, 2022Updated 4 years ago
thustorage / GustANN
View on GitHub
High-Throughput, Cost-Effective Billion-Scale Vector Search with a Single GPU [SIGMOD'26]
☆30Apr 22, 2026Updated 3 months ago
daochenzha / neuroshard
View on GitHub
[MLSys 2023] Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models
☆16May 5, 2023Updated 3 years ago
OctopusMind / RLHF_PPO
View on GitHub
ppo算法实现
☆41Jun 5, 2024Updated 2 years ago