zenrran4nlp/Awesome-LLM-Inference-Serving

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zenrran4nlp/Awesome-LLM-Inference-Serving)

zenrran4nlp / Awesome-LLM-Inference-Serving

☆50

Alternatives and similar repositories for Awesome-LLM-Inference-Serving

Users that are interested in Awesome-LLM-Inference-Serving are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

clarkzjw / mmsys24-starlink-livestreaming
View on GitHub
Low-Latency Live Video Streaming over a Low-Earth-Orbit Satellite Network with DASH
☆18Sep 6, 2024Updated last year
JF-D / Parcae
View on GitHub
☆22Apr 22, 2024Updated 2 years ago
Chasing1020 / SHU-CourseSelection
View on GitHub
上海大学本硕博一体化选课系统自动选课工具
☆18Oct 30, 2022Updated 3 years ago
enrico310786 / brain_tumor_classification
View on GitHub
Brain tumor images classification with ResNet, EfficientNet, EfficientNet_V2 and Compact Convolutional Transformers architectures with Py…
☆11Jan 5, 2023Updated 3 years ago
amazon-science / piperag
View on GitHub
PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design (KDD 2025)
☆32Jun 14, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
nlpie-research / Lightweight-Clinical-Transformers
View on GitHub
This project develops compact transformer models tailored for clinical text analysis, balancing efficiency and performance for healthcare…
☆18Mar 26, 2024Updated 2 years ago
LLMkvsys / rethink-kv-compression
View on GitHub
☆24Mar 7, 2025Updated last year
Scientific-Computing-Lab / MPI-rigen
View on GitHub
MPI Code Generation through Domain-Specific Language Models
☆16Nov 19, 2024Updated last year
ThisisBillhe / ZipAR
View on GitHub
[ICML 2025] This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality…
☆51Mar 25, 2025Updated last year
CodeEval-Pro / CodeEval-Pro
View on GitHub
[ACL'25 Findings] Official repo for "HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Task"
☆40Apr 7, 2025Updated last year
Anonymous1252022 / Megatron-DeepSpeed
View on GitHub
☆18Sep 22, 2024Updated last year
Zh1yuShen / HopWeaver
View on GitHub
Try HopWeaver: The first automatic synthesis framework based on any corpora, with quality approaching manual annotation.
☆27Apr 7, 2026Updated 3 months ago
abdelfattah-lab / TokenButler
View on GitHub
☆27May 12, 2026Updated 2 months ago
zhang677 / PCL-lite
View on GitHub
[ICML 2025] Adaptive Self-improvement LLM Agentic System for ML Library Development
☆17Jan 6, 2026Updated 6 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Thesys-lab / Helix-ASPLOS25
View on GitHub
Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"
☆93Oct 15, 2025Updated 9 months ago
hao-ai-lab / vllm-ltr
View on GitHub
[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank
☆81Nov 4, 2024Updated last year
yuanmu97 / PacketGame
View on GitHub
[SIGCOMM 2023] PacketGame: Multi-Stream Packet Gating for Concurrent Video Inference at Scale
☆15Jul 1, 2023Updated 3 years ago
rumeshmadhusanka / SEP-DDOS-detection
View on GitHub
Application Layer DDoS attack detection using fast entropy computation method
☆13May 18, 2024Updated 2 years ago
braejan / haskell-openai
View on GitHub
Just a Haskell wrapper for OpenAI API calls
☆11Mar 2, 2023Updated 3 years ago
Wang-Ji20 / milewski-ctfp-pdf-zh-cn-translation
View on GitHub
Chinese Translation for Bartosz Milewski's 'Category Theory for Programmers'. 《写给程序员的范畴论》中文翻译欢迎 PR
☆11Oct 4, 2024Updated last year
Dao-AILab / grouped-latent-attention
View on GitHub
☆135May 29, 2025Updated last year
project-etalon / etalon
View on GitHub
LLM Serving Performance Evaluation Harness
☆84Feb 25, 2025Updated last year
SASA-cloud / ICWS-23-HSFL
View on GitHub
Code of "HSFL: Efficient and Privacy-Preserving Offloading for Split and Federated Learning in IoT Services" published on International C…
☆16Oct 30, 2023Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
mit-han-lab / SMEPO
View on GitHub
☆16May 27, 2026Updated last month
shaojiawei07 / Branchy-GNN
View on GitHub
☆23May 29, 2023Updated 3 years ago
Hsword / SpotServe
View on GitHub
SpotServe: Serving Generative Large Language Models on Preemptible Instances
☆135Feb 22, 2024Updated 2 years ago
aws-samples / amazon-comprehend-medical-omop-notes-mapping
View on GitHub
Use Amazon Comprehend Medical to extract medical insight from notes inside the OMOP Common Data Model
☆14Feb 28, 2019Updated 7 years ago
Toseic / LLM-inference-arxiv-daily
View on GitHub
🎓Automatically Update LLM inference systems Papers Daily using Github Actions (Update Every 12th hours)
☆12Jul 13, 2026Updated last week
dskarlatos / ElasticCuckooHashing
View on GitHub
(elastic) cuckoo hashing
☆17Jun 20, 2020Updated 6 years ago
iulia-b10 / query_transformations
View on GitHub
☆13Jan 8, 2024Updated 2 years ago
Zcchill / Value-Residual-Learning
View on GitHub
☆15Mar 20, 2025Updated last year
PacktPublishing / Microsoft-Power-BI-Performance-Best-Practices-Second-Edition
View on GitHub
"Microsoft Power BI Performance Best Practices - Second Edition, published by Packt"
☆13Mar 2, 2026Updated 4 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
hpdps-group / KVServe
View on GitHub
Service-aware KV-cache compression for bandwidth-efficient disaggregated LLM serving.
☆16Updated this week
Multi-LLM / prism-research
View on GitHub
Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.
☆71Mar 17, 2026Updated 4 months ago
seolhokim / BipedalWalker-BranchingDQN
View on GitHub
The Easiest Pytorch Implementation of Branching-DQN
☆12Feb 10, 2021Updated 5 years ago
federerjiang / Plato
View on GitHub
Plato is a system for viewport adaptation based bitrate adaptive VR video streaming.
☆15May 1, 2018Updated 8 years ago
kaist-ina / ns3-tlt-rdma-public
View on GitHub
This is an official GitHub repository for the paper, "Towards timeout-less transport in commodity datacenter networks.".
☆15Sep 7, 2022Updated 3 years ago
INT-FlashAttention2024 / INT-FlashAttention
View on GitHub
☆91Jan 23, 2025Updated last year
TianyuFan0504 / awesome-spatio-temporal-graph
View on GitHub
This repository contains a list of papers on spatio-temporal graph, especially about GNNs on S-T graph.
☆18Sep 8, 2023Updated 2 years ago