NEO-MLSys25/NEO

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NEO-MLSys25/NEO)

NEO-MLSys25 / NEO

NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading

☆100

Alternatives and similar repositories for NEO

Users that are interested in NEO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

interestingLSY / swiftLLM
View on GitHub
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …
☆329Jun 10, 2025Updated last year
caoshiyi / artifacts
View on GitHub
☆40Nov 28, 2024Updated last year
ruipeterpan / marconi
View on GitHub
Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Award, Honorable Mention]
☆63Mar 5, 2025Updated last year
AIS-SNU / GraNNDis_Artifact
View on GitHub
[PACT'24] GraNNDis. A fast and unified distributed graph neural network (GNN) training framework for both full-batch (full-graph) and min…
☆10Aug 13, 2024Updated last year
LLMServe / hydraserve
View on GitHub
☆20May 11, 2026Updated 2 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
ece-fast-lab / ISCA-2025-LIA
View on GitHub
[ISCA'25] LIA: A Single-GPU LLM Inference Acceleration with Cooperative AMX-Enabled CPU-GPU Computation and CXL Offloading
☆25Jan 6, 2026Updated 6 months ago
open-neutrino / neutrino
View on GitHub
☆264Dec 25, 2025Updated 7 months ago
microsoft / RetrievalAttention
View on GitHub
[VLDB 26, NeurIPS 25] Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.
☆149Feb 22, 2026Updated 5 months ago
OpenBitSys / BitDecoding
View on GitHub
[HPCA 2026] A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.
☆96May 14, 2026Updated 2 months ago
SJTU-IPADS / PhoenixOS
View on GitHub
Fast OS-level support for GPU checkpoint and restore
☆285Sep 28, 2025Updated 9 months ago
LLMServe / DistServe
View on GitHub
Disaggregated serving system for Large Language Models (LLMs).
☆826Apr 6, 2025Updated last year
ByteDance-Seed / ShadowKV
View on GitHub
[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
☆310May 1, 2025Updated last year
zejia-lin / BulletServe
View on GitHub
Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration
☆53Jan 8, 2026Updated 6 months ago
thustorage / Medusa
View on GitHub
Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]
☆47May 13, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
blitz-serving / blitz-scale
View on GitHub
The official implementation of OSDI'25 paper BlitzScale
☆48Apr 15, 2026Updated 3 months ago
AIS-SNU / Smart-Infinity
View on GitHub
[HPCA'24] Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System
☆52Jul 21, 2025Updated last year
microsoft / vidur
View on GitHub
Accurate, large-scale, and extensible simulator for LLM inference Systems
☆646Jul 25, 2025Updated last year
utnslab / Medes
View on GitHub
Deduplication over dis-aggregated memory for Serverless Computing
☆14Mar 21, 2022Updated 4 years ago
microsoft / vattention
View on GitHub
Dynamic Memory Management for Serving LLMs without PagedAttention
☆506Jul 17, 2026Updated last week
LLMServe / SwiftTransformer
View on GitHub
High performance Transformer implementation in C++.
☆155Jan 18, 2025Updated last year
OrderLab / TrainCheck
View on GitHub
An Observability Framework for AI Training
☆73Jul 16, 2026Updated last week
llumnix-project / llumnix-ray
View on GitHub
Efficient and easy multi-instance LLM serving
☆563Mar 12, 2026Updated 4 months ago
ovg-project / kvcached
View on GitHub
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
☆1,115Updated this week
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
mental2008 / awesome-papers
View on GitHub
Here are my personal paper reading notes (including machine learning systems, AI infrastructure, and other interesting stuffs).
☆216Jul 19, 2026Updated last week
mingluo-su / ROSE
View on GitHub
[CPAL 2026 oral] Offical implementation of "ROSE: Reordered SparseGPT for More Accurate One-Shot Large Language Models Pruning”
☆16Apr 21, 2026Updated 3 months ago
NoakLiu / PiKV
View on GitHub
PiKV: KV Cache Management System for Mixture of Experts [Efficient ML System]
☆61Updated this week
efeslab / Nanoflow
View on GitHub
A throughput-oriented high-performance serving framework for LLMs
☆969Mar 29, 2026Updated 3 months ago
FFY0 / AdaKV
View on GitHub
The Official Implementation of Ada-KV [NeurIPS 2025]
☆139Nov 26, 2025Updated 8 months ago
NetX-lab / Ayo
View on GitHub
[ASPLOS'25] Towards End-to-End Optimization of LLM-based Applications with Ayo
☆75Mar 11, 2026Updated 4 months ago
ece-fast-lab / ASPLOS-2025-M5
View on GitHub
This is the respository that holds the artifacts of ASPLOS'25 -- M5: Mastering Page Migration and Memory Management for CXL-based Tiered …
☆17Apr 1, 2025Updated last year
EfficientMoE / MoE-Infinity
View on GitHub
PyTorch library for cost-effective, fast and easy serving of MoE models.
☆321Updated this week
LLMkvsys / rethink-kv-compression
View on GitHub
☆24Mar 7, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
yzlnew / infra-skills
View on GitHub
A collection of specialized agent skills for AI infrastructure development, enabling Claude Code to write, optimize, and debug high-perfo…
☆140Jul 9, 2026Updated 2 weeks ago
xinhaoc / ferret
View on GitHub
Autonomous CUDA kernel optimization agent with structured task specs and per-config scoring
☆17Jun 17, 2026Updated last month
kvcache-ai / Mooncake
View on GitHub
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
☆5,999Updated this week
Raphael-Hao / brainstorm
View on GitHub
Compiler for Dynamic Neural Networks
☆45Nov 13, 2023Updated 2 years ago
SuDIS-ZJU / llm-inference-all-in-one
View on GitHub
☆19Feb 18, 2025Updated last year
eunomia-bpf / agentcgroup
View on GitHub
AgentCgroup: Understanding and Controlling OS Resources of AI Agents
☆61Updated this week
TreeAI-Lab / Awesome-KV-Cache-Management
View on GitHub
This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding co…
☆340Jul 16, 2026Updated last week