uclaml / MoELinks
Towards Understanding the Mixture-of-Experts Layer in Deep Learning
☆33Updated 2 years ago
Alternatives and similar repositories for MoE
Users that are interested in MoE are comparing it to the libraries listed below
Sorting:
- State Space Models☆71Updated last year
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆81Updated 2 years ago
- ☆50Updated 10 months ago
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆211Updated 2 months ago
- A repository for DenseSSMs☆89Updated last year
- ☆201Updated last year
- Awesome list of papers that extend Mamba to various applications.☆139Updated 6 months ago
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆56Updated last month
- A curated list of Model Merging methods.☆94Updated 2 weeks ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆31Updated last year
- A regression-alike loss to improve numerical reasoning in language models - ICML 2025☆27Updated 4 months ago
- MultiModN – Multimodal, Multi-Task, Interpretable Modular Networks (NeurIPS 2023)☆35Updated 2 years ago
- [EMNLP 2023, Main Conference] Sparse Low-rank Adaptation of Pre-trained Language Models☆85Updated last year
- HGRN2: Gated Linear RNNs with State Expansion☆56Updated last year
- Decomposing and Editing Predictions by Modeling Model Computation☆139Updated last year
- ☆152Updated last year
- Model Stock: All we need is just a few fine-tuned models☆128Updated 4 months ago
- Official repo of Progressive Data Expansion: data, code and evaluation☆29Updated 2 years ago
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆61Updated last year
- Data Valuation without Training of a Model, submitted to ICLR'23☆22Updated 2 years ago
- Conference schedule, top papers, and analysis of the data for NeurIPS 2023!☆121Updated 2 years ago
- Official PyTorch Implementation of "The Hidden Attention of Mamba Models"☆231Updated 2 months ago
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆31Updated 8 months ago
- ☆34Updated 10 months ago
- [ICLR 2025] Official Code Release for Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation☆47Updated 9 months ago
- Model Merging with SVD to Tie the KnOTS [ICLR 2025]☆80Updated 8 months ago
- Official Code for ICLR 2024 Paper: Non-negative Contrastive Learning☆46Updated last year
- ☆30Updated 2 years ago
- Optimal Transport in the Big Data Era☆113Updated last year
- The official Pytorch implementation of the paper "Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT …☆39Updated last year