caoshiyi/artifacts

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/caoshiyi/artifacts)

caoshiyi / artifacts

☆40

Alternatives and similar repositories for artifacts

Users that are interested in artifacts are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

PKU-SEC-Lab / HybriMoE
View on GitHub
[DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"
☆118Dec 15, 2025Updated 7 months ago
SiriusInfTra / Sirius
View on GitHub
☆18Sep 21, 2025Updated 10 months ago
NEO-MLSys25 / NEO
View on GitHub
NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading
☆99Jun 16, 2025Updated last year
LLMServe / hydraserve
View on GitHub
☆20May 11, 2026Updated 2 months ago
alibaba / hap
View on GitHub
☆16Apr 13, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Adaxry / Unified_Layer_Skipping
View on GitHub
☆15Apr 11, 2024Updated 2 years ago
goliaro / specinfer-ae
View on GitHub
☆28Mar 14, 2024Updated 2 years ago
MoE-Inf / awesome-moe-inference
View on GitHub
Curated collection of papers in MoE model inference
☆409Mar 12, 2026Updated 4 months ago
promoe-opensource / promoe
View on GitHub
☆20Jan 27, 2025Updated last year
EfficientLLMSys / MuxServe
View on GitHub
☆15Jun 26, 2024Updated 2 years ago
YuchongHu / echash
View on GitHub
ACM SoCC 2019, "Coupling Decentralized Key-Value Stores with Erasure Coding"
☆15May 22, 2021Updated 5 years ago
pie-project / pie
View on GitHub
Pie: Programmable LLM Serving
☆184Updated this week
tanzelin430 / libsmctrl
View on GitHub
libsmctrl论文的复现，添加了python端接口，可以在python端灵活调用接口来分配计算资源
☆12May 21, 2024Updated 2 years ago
Geeloon / hexagon_examples
View on GitHub
some hexagon intrinsic examples based on Qualcomm Hexagon
☆17Mar 7, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Leo9660 / HedraRAG_AE
View on GitHub
Artifact Evaluation for SOSP 2025
☆21Aug 16, 2025Updated 11 months ago
hyungyokim / LIA_AMXGPU
View on GitHub
[ISCA'25] LIA: A Single-GPU LLM Inference Acceleration with Cooperative AMX-Enabled CPU-GPU Computation and CXL Offloading
☆13Jun 28, 2025Updated last year
Hazuyuki / PIM-HLS
View on GitHub
☆12Aug 18, 2023Updated 2 years ago
efeslab / fiddler
View on GitHub
[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
☆267Nov 18, 2024Updated last year
hyhuang00 / moe_inference
View on GitHub
Code Repository for the NeurIPS 2024 Paper "Toward Efficient Inference for Mixture of Experts".
☆19Oct 30, 2024Updated last year
yongwonshin / PIMFlow
View on GitHub
☆15Mar 10, 2024Updated 2 years ago
ProjectMitosisOS / dmerge-eurosys24-ae
View on GitHub
Artifact evaluation repo for EuroSys'24.
☆29Nov 7, 2023Updated 2 years ago
thustorage / Medusa
View on GitHub
Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]
☆47May 13, 2025Updated last year
nicknochnack / WatsonxFineTuning
View on GitHub
☆10Jul 8, 2023Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
Stardust-SJF / cuvs_rabitq
View on GitHub
cuVS - a library for vector search and clustering on the GPU. The IVF RaBitQ is under the cuvs_ivf_rabitq branch.
☆19Jun 18, 2026Updated last month
ruipeterpan / marconi
View on GitHub
Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Award, Honorable Mention]
☆63Mar 5, 2025Updated last year
uclasystem / DRust
View on GitHub
☆54Oct 10, 2024Updated last year
SuDIS-ZJU / llm-inference-all-in-one
View on GitHub
☆19Feb 18, 2025Updated last year
bespoke-silicon-group / reallm
View on GitHub
☆18May 19, 2025Updated last year
google / rago
View on GitHub
☆31Jun 22, 2025Updated last year
Yitrus / ArtMem
View on GitHub
ISCA-2025
☆25Mar 3, 2026Updated 4 months ago
ranggihwang / Pregated_MoE
View on GitHub
☆62May 4, 2024Updated 2 years ago
wu-kan / wuk_cupti_wrapper
View on GitHub
a simple API to use CUPTI
☆10Aug 19, 2025Updated 11 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
VITA-Group / Q-Hitter
View on GitHub
☆15Jun 4, 2024Updated 2 years ago
AutonomicPerfectionist / PipeInfer
View on GitHub
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
☆32Nov 16, 2024Updated last year
ChandlerGuan / mercury_artifact
View on GitHub
☆27Oct 1, 2025Updated 9 months ago
stonet-research / cheops25-IO-characterization-of-LLM-model-kv-cache-offloading-nvme
View on GitHub
☆19Apr 15, 2025Updated last year
OpenBitSys / BitDecoding
View on GitHub
[HPCA 2026] A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.
☆96May 14, 2026Updated 2 months ago
EfficientMoE / MoE-Infinity
View on GitHub
PyTorch library for cost-effective, fast and easy serving of MoE models.
☆321Updated this week
hpdps-group / KVServe
View on GitHub
Service-aware KV-cache compression for bandwidth-efficient disaggregated LLM serving.
☆16Updated this week