CASE-Lab-UMD/Unified-MoE-Compression

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/CASE-Lab-UMD/Unified-MoE-Compression)

CASE-Lab-UMD / Unified-MoE-Compression

The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".

☆89

Alternatives and similar repositories for Unified-MoE-Compression

Users that are interested in Unified-MoE-Compression are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Shwai-He / PAD-Net
View on GitHub
Source code of ACL 2023 Main Conference Paper "PAD-Net: An Efficient Framework for Dynamic Networks".
☆14Feb 28, 2026Updated 4 months ago
CASE-Lab-UMD / Capacity-Aware-MoE
View on GitHub
The official implementation of the paper "Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts" (ICLR 2026).
☆20May 31, 2026Updated last month
CASE-Lab-UMD / Router-Tuning-Mixture-of-Depths
View on GitHub
The open-source Mixture of Depths code and the official implementation of the paper "Router-Tuning: A Simple and Effective Approach for E…
☆31Updated this week
Shwai-He / SparseUnifiedModel
View on GitHub
The official implementation of the paper "Understanding and Harnessing Sparsity in Unified Multimodal Models".
☆23Apr 25, 2026Updated 2 months ago
Shwai-He / SparseAdapter
View on GitHub
Source code of EMNLP 2022 Findings paper "SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters"
☆23Feb 28, 2026Updated 4 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Shwai-He / MEO
View on GitHub
The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":
☆47Feb 28, 2026Updated 4 months ago
Shwai-He / VLM-Compression
View on GitHub
The official implementation of the paper "Rethinking Pruning for Vision-Language Models: Strategies for Effective Sparsity".
☆17Jul 2, 2024Updated 2 years ago
duterscmy / CD-MoE
View on GitHub
Official PyTorch implementation of CD-MOE
☆12Mar 18, 2026Updated 4 months ago
CASE-Lab-UMD / LLM-Drop
View on GitHub
The official implementation of the paper "Uncovering the Redundancy in Transformers via a Unified Study of Layer Dropping (TMLR)".
☆191Apr 23, 2026Updated 3 months ago
ziyaow1010 / FedHyper
View on GitHub
Pytorch Code for FedHyper
☆11Aug 28, 2024Updated last year
yangyifei729 / LaCo
View on GitHub
Official implementation for LaCo (EMNLP 2024 Findings)
☆22Oct 3, 2024Updated last year
UNITES-Lab / MC-SMoE
View on GitHub
[ICLR‘24 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"
☆108Jun 20, 2025Updated last year
imagination-research / LCSC
View on GitHub
[ICLR 2025] Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better
☆16Feb 15, 2025Updated last year
PKU-SEC-Lab / AdapMoE
View on GitHub
Code release for AdapMoE accepted by ICCAD 2024
☆39Apr 28, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
luuyin / OWL
View on GitHub
Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"
☆81Jul 7, 2025Updated last year
Lucky-Lance / Expert_Sparsity
View on GitHub
[ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
☆123May 24, 2024Updated 2 years ago
hahnyuan / ASVD4LLM
View on GitHub
Activation-aware Singular Value Decomposition for Compressing Large Language Models
☆92Oct 22, 2024Updated last year
alphadl / R1
View on GitHub
🚀enhanced GRPO with more verifiable rewards and real-time evaluators
☆37Jan 27, 2026Updated 5 months ago
fmfi-compbio / admm-pruning
View on GitHub
☆30Jul 22, 2024Updated 2 years ago
YiteWang / NTK-SAP
View on GitHub
[ICLR2023] NTK-SAP: Improving neural network pruning by aligning training dynamics
☆20May 1, 2023Updated 3 years ago
YJHMITWEB / ExFlow
View on GitHub
Explore Inter-layer Expert Affinity in MoE Model Inference
☆16May 6, 2024Updated 2 years ago
hyhuang00 / moe_inference
View on GitHub
Code Repository for the NeurIPS 2024 Paper "Toward Efficient Inference for Mixture of Experts".
☆19Oct 30, 2024Updated last year
VITA-Group / ramanujan-on-pai
View on GitHub
[ICLR 2023] 'Revisiting Pruning At Initialization Through The Lens of Ramanujan Graph" by Duc Hoang, Shiwei Liu, Radu Marculescu, Atlas W…
☆14Aug 4, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Yibin-Lei / CSQE
View on GitHub
Implementation for EACL 2024 paper "Corpus-Steered Query Expansion with Large Language Models"
☆13Mar 19, 2024Updated 2 years ago
IST-DASLab / MoE-Quant
View on GitHub
Code for data-aware compression of DeepSeek models
☆75Dec 11, 2025Updated 7 months ago
gabrielolympie / moe-pruner
View on GitHub
A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size
☆87Sep 5, 2025Updated 10 months ago
rezashkv / diffusion_pruning
View on GitHub
[ICLR 2025] Adaptive prompt tailored pruning of T2I diffusion models.
☆15Feb 1, 2025Updated last year
pku-liang / ArkVale
View on GitHub
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)
☆54Dec 17, 2024Updated last year
guanchuwang / Taylor-Unswift
View on GitHub
☆22Oct 3, 2024Updated last year
A-suozhang / MixDQ
View on GitHub
[ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
☆14Nov 27, 2024Updated last year
VITA-Group / SMC-Bench
View on GitHub
[ICLR 2023] "Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!" Shiwei Liu, Tianlong Chen, Zhenyu Zhang, Xuxi Chen…
☆28Aug 29, 2023Updated 2 years ago
XinbangZhang / DATA-NAS
View on GitHub
Codes for DATA: Differentiable ArchiTecture Approximation.
☆11Jul 22, 2021Updated 5 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ictnlp / TLAT-NMT
View on GitHub
Source code for the EMNLP 2020 long paper <Token-level Adaptive Training for Neural Machine Translation>.
☆20Oct 28, 2022Updated 3 years ago
Timothyxxx / NeuralSymbolicPapers
View on GitHub
☆14Aug 18, 2022Updated 3 years ago
MoE-Inf / awesome-moe-inference
View on GitHub
Curated collection of papers in MoE model inference
☆408Mar 12, 2026Updated 4 months ago
UNITES-Lab / HEXA-MoE
View on GitHub
Official code for the paper "HEXA-MoE: Efficient and Heterogeneous-Aware MoE Acceleration with Zero Computation Redundancy"
☆15Mar 6, 2025Updated last year
zju-vipa / training_free_model_merging
View on GitHub
This repository is the implementation of the paper Training Free Pretrained Model Merging (CVPR2024).
☆34Mar 5, 2024Updated 2 years ago
LINs-lab / DynMoE
View on GitHub
[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
☆161Jul 9, 2025Updated last year
dropbox / low-rank-llama2
View on GitHub
Low-Rank Llama Custom Training
☆23Mar 27, 2024Updated 2 years ago