uclaml / MoELinks
Towards Understanding the Mixture-of-Experts Layer in Deep Learning
☆35Updated 2 years ago
Alternatives and similar repositories for MoE
Users that are interested in MoE are comparing it to the libraries listed below
Sorting:
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆82Updated 2 years ago
- Decomposing and Editing Predictions by Modeling Model Computation☆139Updated last year
- State Space Models☆72Updated last year
- ☆50Updated last year
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆31Updated last year
- Model Stock: All we need is just a few fine-tuned models☆129Updated 6 months ago
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆61Updated last year
- Official repo of Progressive Data Expansion: data, code and evaluation☆29Updated 2 years ago
- Model Merging with SVD to Tie the KnOTS [ICLR 2025]☆85Updated 10 months ago
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆213Updated last week
- Official implementation of ORCA proposed in the paper "Cross-Modal Fine-Tuning: Align then Refine"☆73Updated last year
- Optimal Transport in the Big Data Era☆116Updated last year
- Official Code for ICLR 2024 Paper: Non-negative Contrastive Learning☆47Updated last year
- ☆208Updated 2 years ago
- MultiModN – Multimodal, Multi-Task, Interpretable Modular Networks (NeurIPS 2023)☆35Updated 2 years ago
- Awesome list of papers that extend Mamba to various applications.☆138Updated 7 months ago
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆56Updated 3 months ago
- A repository for DenseSSMs☆88Updated last year
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆57Updated last year
- A regression-alike loss to improve numerical reasoning in language models - ICML 2025☆28Updated 5 months ago
- ☆35Updated 11 months ago
- ☆152Updated last year
- HGRN2: Gated Linear RNNs with State Expansion☆56Updated last year
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆91Updated last year
- [NeurIPS 2023] Factorized Contrastive Learning: Going Beyond Multi-view Redundancy☆74Updated 2 years ago
- Repository for research works and resources related to model reprogramming <https://arxiv.org/abs/2202.10629>☆64Updated 4 months ago
- Official Code for Paper: Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation☆133Updated last month
- [ICLR 2025] Official Code Release for Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation☆49Updated 11 months ago
- Recycling diverse models☆46Updated 3 years ago
- ☆48Updated last year