wassemgtk/MegaScale-Infer-Prototyp

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/wassemgtk/MegaScale-Infer-Prototyp)

wassemgtk / MegaScale-Infer-Prototyp

Prototyp MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism

☆31

Alternatives and similar repositories for MegaScale-Infer-Prototyp

Users that are interested in MegaScale-Infer-Prototyp are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

RC4ML / RPCNIC
View on GitHub
RPCNIC: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator [HPCA2025]
☆15Dec 9, 2024Updated last year
tonyzhao-jt / LLM-PQ
View on GitHub
Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …
☆39Aug 29, 2025Updated 10 months ago
dorpxam / einops-cpp
View on GitHub
C++17 implementation of einops for libtorch - clear and reliable tensor manipulations with einstein-like notation
☆12Oct 16, 2023Updated 2 years ago
AIS-SNU / GraNNDis_Artifact
View on GitHub
[PACT'24] GraNNDis. A fast and unified distributed graph neural network (GNN) training framework for both full-batch (full-graph) and min…
☆10Aug 13, 2024Updated last year
wangyccn / CR-AI-V1.5
View on GitHub
CRAI is a multimodal large language model based on the Mixture of Experts (MoE) architecture, supporting text and image cross-modal tasks…
☆16Apr 29, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
heheda12345 / Jenga-SOSP25-AE
View on GitHub
☆15Aug 16, 2025Updated 11 months ago
hongsunjang / pipe-bd
View on GitHub
[DATE 2023] Pipe-BD: Pipelined Parallel Blockwise Distillation
☆12Jul 13, 2023Updated 3 years ago
Bigyehahaha / M4
View on GitHub
The code of 《M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis》
☆14Mar 31, 2025Updated last year
stepfun-ai / StepMesh
View on GitHub
☆377Jan 28, 2026Updated 5 months ago
SNU-ARC / Ginex
View on GitHub
Ginex: SSD-enabled Billion-scale Graph Neural Network Training on a Single Machine via Provably Optimal In-memory Caching
☆43Jul 10, 2024Updated 2 years ago
FlexFusion / FlexFusion
View on GitHub
The official implementation for the intra-stage fusion technique introduced in https://arxiv.org/abs/2409.13221
☆31Apr 22, 2025Updated last year
IntelliSys-Lab / FineMoE-EuroSys26
View on GitHub
☆15Sep 25, 2025Updated 9 months ago
wrmedford / moe-scaling
View on GitHub
Scaling Laws for Mixture of Experts Models
☆15Feb 25, 2025Updated last year
tianyi-lab / R2-T2
View on GitHub
[ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"
☆19Mar 10, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
a1bc2def6g / fastgl-ae
View on GitHub
☆17Jun 25, 2024Updated 2 years ago
zdebruine / MMVAE
View on GitHub
Mixture-of-Experts Multimodal Variational Autoencoder
☆15Jul 3, 2025Updated last year
microsoft / tokenweave
View on GitHub
Accepted to MLSys 2026
☆91Apr 19, 2026Updated 3 months ago
Toseic / LLM-inference-arxiv-daily
View on GitHub
🎓Automatically Update LLM inference systems Papers Daily using Github Actions (Update Every 12th hours)
☆12Jul 13, 2026Updated last week
ChaseLab-PKU / InstAttention
View on GitHub
InstAttention: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference
☆18Mar 30, 2025Updated last year
The-Swarm-Corporation / Mamba-R1
View on GitHub
Mamba R1 represents a novel architecture that combines the efficiency of Mamba's state space models with the scalability of Mixture of Ex…
☆25Oct 13, 2025Updated 9 months ago
scale-snu / layered-prefill
View on GitHub
Layered prefill changes the scheduling axis from tokens to layers and removes redundant MoE weight reloads while keeping decode stall fre…
☆18Mar 9, 2026Updated 4 months ago
iamkanghyunchoi / falqon
View on GitHub
Official repository of paper [FALQON: Accelerating LoRA Fine-tuning with Low-Bit Floating-Point Arithmetic, NeurIPS 2025]
☆21Dec 2, 2025Updated 7 months ago
hongsunjang / HILOS
View on GitHub
[ASPLOS'26] HILOS: A Cost-Effective Near-Storage Processing Solution for Offline Inference of Long-Context LLMs
☆20Jan 18, 2026Updated 6 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
CalvinXKY / EPLB_visualization
View on GitHub
Visualize the Expert Parallelism Load Balancer
☆19Mar 15, 2025Updated last year
SNU-ARC / flashneuron
View on GitHub
☆41Nov 28, 2022Updated 3 years ago
iamkanghyunchoi / ait
View on GitHub
It's All In the Teacher: Zero-Shot Quantization Brought Closer to the Teacher [CVPR 2022 Oral]
☆29Sep 15, 2022Updated 3 years ago
he-h / ST-MoE-BERT
View on GitHub
This repository contains the code for the paper "ST-MoE-BERT: A Spatial-Temporal Mixture-of-Experts Framework for Long-Term Cross-City Mo…
☆16Feb 20, 2025Updated last year
epfml / pam
View on GitHub
☆16Dec 9, 2023Updated 2 years ago
AIS-SNU / PathWeaver
View on GitHub
A High-Throughput Multi-GPU System for Graph-Based Approximate Nearest Neighbor Search
☆21Jul 22, 2025Updated 11 months ago
jaehongm / eZNS
View on GitHub
☆14Aug 2, 2023Updated 2 years ago
ChenZiHong-Gavin / MoE-Visualizer
View on GitHub
MoE-Visualizer is a tool designed to visualize the selection of experts in Mixture-of-Experts (MoE) models.
☆16Apr 8, 2025Updated last year
Xilinx / libdfx
View on GitHub
☆13Jun 14, 2026Updated last month
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
second-state / wasmedge-wasi-nn
View on GitHub
High-level bindings for wasi-nn system calls
☆20Aug 29, 2025Updated 10 months ago
ttw1018 / MoPE-DST
View on GitHub
The code for "MoPE: Mixture of Prefix Experts for Zero-Shot Dialogue State Tracking"
☆19Jan 25, 2025Updated last year
fpgasystems / Chameleon-RAG-Acceleration
View on GitHub
☆23Jun 1, 2025Updated last year
UnifiedTSAI / UnifiedTSLib
View on GitHub
☆20Jul 29, 2025Updated 11 months ago
NetX-lab / Echo-slowdown
View on GitHub
Slowdown prediction module of Echo: Simulating Distributed Training at Scale
☆13Jul 11, 2026Updated last week
muriloboratto / NVSHEMEM
View on GitHub
Sample Codes using NVSHMEM on Multi-GPU
☆30Jan 22, 2023Updated 3 years ago
Relaxed-System-Lab / HexGen
View on GitHub
[ICML 2024] Serving LLMs on heterogeneous decentralized clusters.
☆37May 6, 2024Updated 2 years ago