timlee0212/SiDA-MoE

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/timlee0212/SiDA-MoE)

timlee0212 / SiDA-MoE

Code for MLSys 2024 Paper "SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models"

☆22

Alternatives and similar repositories for SiDA-MoE

Users that are interested in SiDA-MoE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

MaoZiming / papers
View on GitHub
Paper-reading notes for Berkeley OS prelim exam.
☆14Aug 28, 2024Updated last year
csl-iisc / SUV-MICRO24
View on GitHub
☆13Oct 6, 2024Updated last year
thomashirtz / gym-battleship
View on GitHub
Battleship environment for reinforcement learning tasks
☆14Apr 29, 2023Updated 3 years ago
KarineAyrs / knowledge-distillation-semantic-search
View on GitHub
KDSS is the framework for knowledge distillation from LLMs
☆12Nov 5, 2025Updated 8 months ago
hyhuang00 / moe_inference
View on GitHub
Code Repository for the NeurIPS 2024 Paper "Toward Efficient Inference for Mixture of Experts".
☆19Oct 30, 2024Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
spcl / crosspipe
View on GitHub
Official implementation of CrossPipe: Towards Optimal Pipeline Schedules for Cross-Datacenter Training (ATC '25), built on top of Megatro…
☆17Jul 6, 2025Updated last year
wangdongjie100 / KDD2020
View on GitHub
Incremental Mobile User Profiling: Reinforcement Learning with Spatial Knowledge Graph for Modeling Event Streams
☆15Jul 25, 2024Updated last year
Zishan-Shao / FlashSVD
View on GitHub
[AAAI 2026] Official implementation of "FlashSVD: Memory-Efficient Inference with Streaming for Low-Rank Models". If you find this reposi…
☆17May 1, 2026Updated 2 months ago
YJHMITWEB / ExFlow
View on GitHub
Explore Inter-layer Expert Affinity in MoE Model Inference
☆16May 6, 2024Updated 2 years ago
EfficientMoE / MoE-Infinity
View on GitHub
PyTorch library for cost-effective, fast and easy serving of MoE models.
☆321Updated this week
uwsampl / paper-agents
View on GitHub
☆13Dec 9, 2024Updated last year
casys-kaist / glet
View on GitHub
☆53Dec 26, 2024Updated last year
MoE-Inf / awesome-moe-inference
View on GitHub
Curated collection of papers in MoE model inference
☆409Mar 12, 2026Updated 4 months ago
WukLab / InferCept
View on GitHub
☆34Jun 22, 2024Updated 2 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
efeslab / fiddler
View on GitHub
[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
☆267Nov 18, 2024Updated last year
tonyzhao-jt / LLM-PQ
View on GitHub
Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …
☆39Aug 29, 2025Updated 10 months ago
OSH-2022 / x-runikraft
View on GitHub
2022 USTC 011705 (OSH) Course Project of Runikraft Group
☆13Jul 22, 2022Updated 4 years ago
UNITES-Lab / Occult
View on GitHub
[ICML‘25] Official code for paper "Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training an…
☆13Apr 17, 2025Updated last year
Scientific-Computing-Lab / STREAMer
View on GitHub
STREAMer: Benchmarking remote volatile and non-volatile memory bandwidth
☆18Aug 21, 2023Updated 2 years ago
uclaml / MoE
View on GitHub
Towards Understanding the Mixture-of-Experts Layer in Deep Learning
☆35Dec 12, 2023Updated 2 years ago
s4-lab-cuhksz / torpor
View on GitHub
☆21May 27, 2025Updated last year
awslabs / Lancet-Accelerating-MoE-Training-via-Whole-Graph-Computation-Communication-Overlapping
View on GitHub
Official implementation for the paper Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapp…
☆14May 20, 2026Updated 2 months ago
rapidsai / ucxx
View on GitHub
☆68Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
SJTU-IPADS / disb
View on GitHub
DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.
☆58Aug 21, 2024Updated last year
octohelm / wagon
View on GitHub
deprecated, use https://github.com/octohelm/piper instead.
☆14Sep 3, 2024Updated last year
eniac / TELEPORT
View on GitHub
Optimizing data-intensive systems in disaggregated data centers
☆13Jun 13, 2022Updated 4 years ago
flashinfer-ai / debug-print
View on GitHub
Debug print operator for cudagraph debugging
☆18Aug 2, 2024Updated last year
LLMServe / dLoRA-artifact
View on GitHub
☆32May 28, 2024Updated 2 years ago
modelscope / Katz
View on GitHub
[ATC'25] Katz is a high-performance serving system designed specifically for diffusion model workflows with multiple adapters.
☆24May 26, 2025Updated last year
microsoft / glinthawk
View on GitHub
An LLM inference engine, written in C++
☆20Mar 30, 2026Updated 3 months ago
vbdi / epdserve
View on GitHub
[ICML 2025] Efficiently Serving Large Multimodal Models Using EPD Disaggregation
☆24Jul 11, 2026Updated last week
sparanoid / omnibox-ncr
View on GitHub
Force google.com (aka Google No Country Redirect) when searching in Google Chrome Omnibox
☆10Jul 7, 2026Updated 2 weeks ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
HeegyuKim / korouge
View on GitHub
Google 공식 Rouge Implementation을 한국어에서 사용할 수 있도록 처리
☆17Jan 3, 2024Updated 2 years ago
Cloud-and-Distributed-Systems / Erms
View on GitHub
☆27Nov 15, 2024Updated last year
tanyuqian / redco
View on GitHub
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
☆69Dec 9, 2024Updated last year
ss7krd / Usher
View on GitHub
☆14Nov 7, 2024Updated last year
zhuangwang93 / Espresso
View on GitHub
Hi-Speed DNN Training with Espresso: Unleashing the Full Potential of Gradient Compression with Near-Optimal Usage Strategies (EuroSys '2…
☆15Sep 21, 2023Updated 2 years ago
ustcllm / Nexus
View on GitHub
Toolkit for Universal Retrieval, such as text retrieval, item recommendation, image retrieval, etc.
☆17Sep 15, 2025Updated 10 months ago
ochinchina / go-ini
View on GitHub
parser for a microsoft .ini format file & java .properties file in golang
☆13Jul 8, 2018Updated 8 years ago