uclaml / MoELinks
Towards Understanding the Mixture-of-Experts Layer in Deep Learning
☆34Updated 2 years ago
Alternatives and similar repositories for MoE
Users that are interested in MoE are comparing it to the libraries listed below
Sorting:
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆82Updated 2 years ago
- ☆50Updated 11 months ago
- State Space Models☆71Updated last year
- HGRN2: Gated Linear RNNs with State Expansion☆56Updated last year
- Decomposing and Editing Predictions by Modeling Model Computation☆139Updated last year
- MultiModN – Multimodal, Multi-Task, Interpretable Modular Networks (NeurIPS 2023)☆35Updated 2 years ago
- A repository for DenseSSMs☆88Updated last year
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆31Updated last year
- ☆152Updated last year
- Model Merging with SVD to Tie the KnOTS [ICLR 2025]☆85Updated 9 months ago
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆213Updated last week
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆56Updated 3 months ago
- Official implementation of ORCA proposed in the paper "Cross-Modal Fine-Tuning: Align then Refine"☆74Updated last year
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆61Updated last year
- Official PyTorch Implementation of "The Hidden Attention of Mamba Models"☆231Updated 3 months ago
- Unofficial Implementation of Selective Attention Transformer☆20Updated last year
- ☆34Updated 11 months ago
- Model Stock: All we need is just a few fine-tuned models☆128Updated 5 months ago
- [ICLR 2025] Official Code Release for Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation☆48Updated 10 months ago
- Official Code for ICLR 2024 Paper: Non-negative Contrastive Learning☆46Updated last year
- Official repo of Progressive Data Expansion: data, code and evaluation☆29Updated 2 years ago
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆32Updated 9 months ago
- Repository for research works and resources related to model reprogramming <https://arxiv.org/abs/2202.10629>☆65Updated 4 months ago
- Implementation of Infini-Transformer in Pytorch☆112Updated last year
- Awesome list of papers that extend Mamba to various applications.☆138Updated 7 months ago
- A regression-alike loss to improve numerical reasoning in language models - ICML 2025☆27Updated 5 months ago
- ☆204Updated last year
- Conference schedule, top papers, and analysis of the data for NeurIPS 2023!☆120Updated 2 years ago
- ☆91Updated last year
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆31Updated 8 months ago