uclaml / MoELinks

Towards Understanding the Mixture-of-Experts Layer in Deep Learning

☆32

Alternatives and similar repositories for MoE

Users that are interested in MoE are comparing it to the libraries listed below

Sorting:

fkodom / soft-mixture-of-experts
PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)
☆78Updated 2 years ago
badripatro / mamba360
State Space Models
☆71Updated last year
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆55Updated last year
wang-kee / LiNeS
Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"
☆31Updated last year
MadryLab / modelcomponents
Decomposing and Editing Predictions by Modeling Model Computation
☆138Updated last year
Chaos96 / fourierft
☆149Updated last year
Weixin-Liang / Mixture-of-Mamba
☆50Updated 10 months ago
ExplainableML / fomo_in_flux
Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]
☆60Updated 11 months ago
EPFLiGHT / MultiModN
MultiModN – Multimodal, Multi-Task, Interpretable Modular Networks (NeurIPS 2023)
☆35Updated 2 years ago
neilwen987 / CSR_Adaptive_Rep
Official Code for Paper: Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
☆128Updated 4 months ago
naver-ai / model-stock
Model Stock: All we need is just a few fine-tuned models
☆127Updated 3 months ago
PKU-ML / non_neg
Official Code for ICLR 2024 Paper: Non-negative Contrastive Learning
☆46Updated last year
JJchy / CG_score
Data Valuation without Training of a Model, submitted to ICLR'23
☆22Updated 2 years ago
fangyuan-ksgk / selective-attention-transformer
Unofficial Implementation of Selective Attention Transformer
☆17Updated last year
RobertCsordas / moeut
☆88Updated last year
VijayLingam95 / SVFT
☆33Updated 9 months ago
prateeky2806 / ties-merging
☆198Updated last year
assafbk / DeciMamba
DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)
☆31Updated 7 months ago
IBM / model-reprogramming
Repository for research works and resources related to model reprogramming <https://arxiv.org/abs/2202.10629>
☆64Updated 2 months ago
VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆56Updated 2 years ago
kyegomez / MambaTransformer
Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling
☆211Updated last month
uclaml / PDE
Official repo of Progressive Data Expansion: data, code and evaluation
☆29Updated 2 years ago
r-three / phatgoose
Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"
☆91Updated last year
gstoica27 / KnOTS
Model Merging with SVD to Tie the KnOTS [ICLR 2025]
☆75Updated 7 months ago
ycjing / Awesome-Model-Merging
A curated list of Model Merging methods.
☆92Updated last year
dmis-lab / Monet
[ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers
☆73Updated 5 months ago
tum-ai / number-token-loss
A regression-alike loss to improve numerical reasoning in language models - ICML 2025
☆26Updated 3 months ago
EvanZhuang / vector-icl
Official implementation of Vector-ICL: In-context Learning with Continuous Vector Representations (ICLR 2025)
☆20Updated 5 months ago
AmeenAli / HiddenMambaAttn
Official PyTorch Implementation of "The Hidden Attention of Mamba Models"
☆229Updated last month
katiekang1998 / reasoning_generalization
☆33Updated 10 months ago