PDZZXL/Awesome-LLM-Serving

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/PDZZXL/Awesome-LLM-Serving)

PDZZXL / Awesome-LLM-Serving

Large Language Model (LLM) Serving Paper and Resource List

☆29

Alternatives and similar repositories for Awesome-LLM-Serving

Users that are interested in Awesome-LLM-Serving are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OwnLabAI / ownlab
View on GitHub
Local-first, open-source Claude Science alternative. Web/ Desktop App. Claude Code/ Codex.
☆21Jul 1, 2026Updated 2 weeks ago
maestro-project / magma
View on GitHub
☆18Jun 17, 2022Updated 4 years ago
pnnl / soda-opt
View on GitHub
☆62Jul 1, 2025Updated last year
LivingFutureLab / UQABench
View on GitHub
[KDD 2025] The source code for UQABench
☆12Aug 18, 2025Updated 11 months ago
fpgasystems / Chameleon-RAG-Acceleration
View on GitHub
☆23Jun 1, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Multi-LLM / prism-research
View on GitHub
Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.
☆71Mar 17, 2026Updated 4 months ago
Yitrus / ArtMem
View on GitHub
ISCA-2025
☆25Mar 3, 2026Updated 4 months ago
jhcgt4869 / gugua_helps
View on GitHub
七夕孤寡助手
☆13Aug 7, 2021Updated 4 years ago
veranki / knn-fpga-hls
View on GitHub
FPGA implementation of a handwritten digit recognition system based on k-nearest-neighbors (k-NN) classifier algorithm.
☆21Apr 3, 2018Updated 8 years ago
jkehne / GPUswap
View on GitHub
Oversubscription of GPU Memory through Transparent Swapping
☆15Mar 27, 2015Updated 11 years ago
UofT-EcoSystem / Tempo
View on GitHub
Memory footprint reduction for transformer models
☆11Jan 24, 2023Updated 3 years ago
ZhW-loop / UniCoMo
View on GitHub
☆13Sep 19, 2024Updated last year
NNHieu / CoLR-FedRec
View on GitHub
Correlated Low-rank Structure (CoLR) for Federated Recommendation System
☆13May 31, 2026Updated last month
xliu0709 / WinoCNN
View on GitHub
An HLS based winograd systolic CNN accelerator
☆54Jul 18, 2021Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
JerryYin777 / FPGA_Competition-RISC-V_Processor-in-PGL22G
View on GitHub
FPGA Innovation Design Competition：RISC-V Processor-based Hardware and Software Design in PGL22G
☆12Sep 1, 2023Updated 2 years ago
EfficientLLMSys / MuxServe
View on GitHub
☆15Jun 26, 2024Updated 2 years ago
icgrp / prflow_nested_dfx
View on GitHub
Fast and Flexible FPGA development using Hierarchical Partial Reconfiguration (FPT 2022)
☆15Mar 21, 2024Updated 2 years ago
mit-han-lab / SMEPO
View on GitHub
☆16May 27, 2026Updated last month
ss7krd / Usher
View on GitHub
☆14Nov 7, 2024Updated last year
georgia-tech-synergy-lab / hardtaco-hls
View on GitHub
HLS project modeling various sparse accelerators.
☆12Jan 11, 2022Updated 4 years ago
Quangmire / voyager
View on GitHub
☆24Apr 10, 2022Updated 4 years ago
UCLA-VAST / FlexCNN
View on GitHub
☆74Feb 16, 2023Updated 3 years ago
FindDefinition / PCCM
View on GitHub
Python C++ Code Manager
☆15Sep 29, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Blaok / fpga-runtime
View on GitHub
☆13Aug 1, 2024Updated last year
dongxianzhe / hydrainfer
View on GitHub
a mllm inference engine for academic research
☆21Jan 30, 2026Updated 5 months ago
alibaba-damo-academy / K-Forcing
View on GitHub
Official implementation for "K-Forcing: Joint Next-K-Token Decoding via Push-Forward Language Modeling"
☆16Jun 14, 2026Updated last month
ic-lab-duth / DRIM4HLS
View on GitHub
DUTH RISC V Microprocessor for High Level Synthesis
☆10Jun 23, 2023Updated 3 years ago
HKUST-SING / herald
View on GitHub
Herald: Accelerating Neural Recommendation Training with Embedding Scheduling (NSDI 2024)
☆23May 9, 2024Updated 2 years ago
shashank-agg / octoray
View on GitHub
☆10Mar 20, 2021Updated 5 years ago
Harry710887048 / Awesome-Label-Efficient-3D-Object-Detection
View on GitHub
Awesome Label-Efficient 3D Object Detection: A curated list of label-efficient 3D object detection: Unsupervised, Weakly-Supervised, Spar…
☆19Nov 13, 2025Updated 8 months ago
Linking-ai / SCOPE
View on GitHub
(ACL2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation
☆36May 28, 2025Updated last year
UIUC-ChenLab / ScaleHLS-HIDA
View on GitHub
☆63Mar 24, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
kmisimn76 / SparseAccel
View on GitHub
CNN simd based accelerator using Vitis HLS
☆11Jul 15, 2022Updated 4 years ago
ucamrl / xrlflow
View on GitHub
☆13Mar 6, 2023Updated 3 years ago
cake-lab / perseus
View on GitHub
☆10Jul 5, 2023Updated 3 years ago
shriramsb / vdnn-plus-plus
View on GitHub
Implementation of vDNN++; an improvement over vDNN
☆18Dec 7, 2018Updated 7 years ago
llumnix-project / llumnix-ray
View on GitHub
Efficient and easy multi-instance LLM serving
☆562Mar 12, 2026Updated 4 months ago
alibaba / llm-scheduling-artifact
View on GitHub
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆64Jun 5, 2024Updated 2 years ago
matrix97317 / OneNeuralNetwork
View on GitHub
This is a cross-chip platform collection of operators and a unified neural network library.
☆17Nov 3, 2023Updated 2 years ago