uclaml / MoELinks
Towards Understanding the Mixture-of-Experts Layer in Deep Learning
☆31Updated last year
Alternatives and similar repositories for MoE
Users that are interested in MoE are comparing it to the libraries listed below
Sorting:
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆78Updated 2 years ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆30Updated 11 months ago
- Official repo of Progressive Data Expansion: data, code and evaluation☆29Updated last year
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆58Updated 10 months ago
- State Space Models☆70Updated last year
- Decomposing and Editing Predictions by Modeling Model Computation☆138Updated last year
- [ICLR 2025] Official Code Release for Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation☆47Updated 7 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated last year
- Model Stock: All we need is just a few fine-tuned models☆125Updated 2 months ago
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆206Updated last week
- MultiModN – Multimodal, Multi-Task, Interpretable Modular Networks (NeurIPS 2023)☆33Updated 2 years ago
- ☆191Updated last year
- Model Merging with SVD to Tie the KnOTS [ICLR 2025]☆68Updated 6 months ago
- ☆147Updated last year
- Optimal Transport in the Big Data Era☆111Updated 11 months ago
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆56Updated last week
- ☆33Updated 8 months ago
- Official implementation of MAIA, A Multimodal Automated Interpretability Agent☆91Updated 3 months ago
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆32Updated 7 months ago
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆90Updated last year
- Official implementation of ORCA proposed in the paper "Cross-Modal Fine-Tuning: Align then Refine"☆72Updated last year
- Official Code for ICLR 2024 Paper: Non-negative Contrastive Learning☆46Updated last year
- ☆50Updated 8 months ago
- Implementation of Infini-Transformer in Pytorch☆113Updated 9 months ago
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆56Updated last year
- Official implementation for Equivariant Architectures for Learning in Deep Weight Spaces [ICML 2023]☆89Updated 2 years ago
- This is the official implementation for Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models.☆30Updated 2 years ago
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆31Updated 6 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆106Updated last week
- A curated list of Model Merging methods.☆92Updated last year