cat538/MxMoE

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/cat538/MxMoE)

cat538 / MxMoE

[ICML 2025] MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design

☆30

Alternatives and similar repositories for MxMoE

Users that are interested in MxMoE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Supercomputing-System-AI-Lab / MiLo
View on GitHub
Code repo for efficient quantized MoE inference with mixture of low-rank compensators
☆39Apr 14, 2025Updated last year
chenzx921020 / MoEQuant
View on GitHub
☆17Apr 7, 2025Updated last year
Aaronhuang-778 / Mixture-Compressor-MoE
View on GitHub
[ICLR 2025, IEEE TPAMI 2026] Mixture Compressor & MC#
☆75Feb 12, 2025Updated last year
Summer-Summer / Kitty
View on GitHub
Algorithm-System Co-design: accurate and efficient 2-bit KV cache quantization for LLM Inference.
☆17May 20, 2026Updated 2 months ago
GATECH-EIC / Linearized-LLM
View on GitHub
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆35Jun 12, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
NVlabs / EoRA
View on GitHub
[ICLRW'26] EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
☆49Apr 21, 2026Updated 3 months ago
IST-DASLab / MoE-Quant
View on GitHub
Code for data-aware compression of DeepSeek models
☆75Dec 11, 2025Updated 7 months ago
Yeyke / HBLLM
View on GitHub
[NeurIPS 2025 (spotlight)] HBLLM: Wavelet-Enhanced High-Fidelity 1-Bit Quantization for LLMs
☆16Dec 17, 2025Updated 7 months ago
Lucky-Lance / Expert_Sparsity
View on GitHub
[ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
☆123May 24, 2024Updated 2 years ago
UNITES-Lab / C2R-MoE
View on GitHub
[NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…
☆16Feb 4, 2025Updated last year
SJTU-ReArch-Group / M2XFP_ASPLOS26
View on GitHub
[ASPLOS 2026] M2XFP: A Metadata-Augmented Microscaling Data Format for Efficient Low-bit Quantization.
☆15Jan 29, 2026Updated 5 months ago
apple / ml-epicache
View on GitHub
☆30Oct 2, 2025Updated 9 months ago
cat538 / SKVQ
View on GitHub
[COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
☆24Oct 5, 2024Updated last year
FPSG-UIUC / micro24-fusemax-artifact
View on GitHub
MICRO 2024 Evaluation Artifact for FuseMax
☆17Aug 26, 2024Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
IST-DASLab / MatGPTQ
View on GitHub
Code for MatGPTQ: Accurate and Efficient Post-Training Matryoshka Quantization
☆22Feb 18, 2026Updated 5 months ago
Anonymous1252022 / fp4-all-the-way
View on GitHub
☆52May 20, 2025Updated last year
tim-lawson / skip-middle
View on GitHub
Learning to Skip the Middle Layers of Transformers
☆17Aug 7, 2025Updated 11 months ago
NYCU-EDgeAi / subspec
View on GitHub
[NeurIPS 2025] Speculate Deep and Accurate
☆23Jan 16, 2026Updated 6 months ago
DoubtedSteam / RoE
View on GitHub
The official implement of "Routing Experts: Learning to Route Dynamic Experts in Existing Multi-modal Large Language Models"
☆17Mar 24, 2025Updated last year
akhilkedia / TranformersGetStable
View on GitHub
[ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"
☆11Jul 19, 2024Updated 2 years ago
grasses / RemovalNet
View on GitHub
Code for paper: "RemovalNet: DNN model fingerprinting removal attack", IEEE TDSC 2023.
☆10Nov 27, 2023Updated 2 years ago
IST-DASLab / spdy
View on GitHub
Code for ICML 2022 paper "SPDY: Accurate Pruning with Speedup Guarantees"
☆20May 3, 2023Updated 3 years ago
PanZaifeng / KVFlow
View on GitHub
☆28Mar 12, 2026Updated 4 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
GATECH-EIC / ShiftAddViT
View on GitHub
[NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
☆30Dec 6, 2023Updated 2 years ago
EfficientMoE / MoE-Infinity
View on GitHub
PyTorch library for cost-effective, fast and easy serving of MoE models.
☆321Updated this week
aiha-lab / MX-QLLM
View on GitHub
LLM Inference with Microscaling Format
☆35Nov 12, 2024Updated last year
phuocphn / uniq
View on GitHub
Pytorch implementation of our UniQ method, IEEE Access -- Training Multi-bit Quantized and Binarized Networks with A Learnable Symmetric …
☆11Apr 7, 2021Updated 5 years ago
yonsei-sslab / asgard
View on GitHub
The artifact for NDSS '25 paper "ASGARD: Protecting On-Device Deep Neural Networks with Virtualization-Based Trusted Execution Environmen…
☆16Oct 16, 2025Updated 9 months ago
KaiLv69 / DuoDecoding
View on GitHub
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting
☆19Mar 4, 2025Updated last year
LeiWang1999 / EthernetVideo
View on GitHub
Use FPGA to Transfer Image with Gigabits Ethernet
☆19Dec 2, 2020Updated 5 years ago
raiyyanfaisal09 / Router1X3_RTL_Design
View on GitHub
- A 1X3 Router (capable of routing the data packets to three different clients form a single source network) was designed, including a re…
☆11Jun 3, 2019Updated 7 years ago
Sys-KU / DeepPlan
View on GitHub
[ACM EuroSys 2023] Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access
☆56Aug 6, 2025Updated 11 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
SamsungLabs / PMPD
View on GitHub
Codebase for the Progressive Mixed-Precision Decoding paper.
☆22Jul 15, 2025Updated last year
KULeuven-MICAS / snax-gemm
View on GitHub
☆17Jul 1, 2024Updated 2 years ago
Compass-All / NDSS24-CAGE
View on GitHub
☆16Jan 5, 2024Updated 2 years ago
Megum1 / LOTUS
View on GitHub
[CVPR'24] LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning
☆15Apr 17, 2026Updated 3 months ago
RiS3-Lab / ShadowNet
View on GitHub
Open-source code and data for ShadowNet(S&P Oakland'23)
☆12Mar 11, 2024Updated 2 years ago
jeffreyyu0602 / quantized-training
View on GitHub
☆35Dec 22, 2025Updated 7 months ago
JimyMa / FuncTs
View on GitHub
[DAC2024] A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning
☆15Jan 13, 2024Updated 2 years ago