uclaml / MoE
Towards Understanding the Mixture-of-Experts Layer in Deep Learning
☆27Updated last year
Alternatives and similar repositories for MoE:
Users that are interested in MoE are comparing it to the libraries listed below
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated 7 months ago
- Model Stock: All we need is just a few fine-tuned models☆107Updated 6 months ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆71Updated last year
- [NeurIPS 2023] Official repository for "Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models"☆12Updated 9 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆25Updated 5 months ago
- ☆26Updated 8 months ago
- ☆31Updated 5 months ago
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆39Updated 6 months ago
- Recycling diverse models☆44Updated 2 years ago
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆52Updated last week
- Official Code for ICLR 2024 Paper: Non-negative Contrastive Learning☆42Updated 11 months ago
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆54Updated 4 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆37Updated 6 months ago
- ☆28Updated last year
- Official repo of Progressive Data Expansion: data, code and evaluation☆28Updated last year
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆25Updated this week
- State Space Models☆67Updated 11 months ago
- ☆18Updated 3 weeks ago
- Code for "Surgical Fine-Tuning Improves Adaptation to Distribution Shifts" published at ICLR 2023☆29Updated last year
- I2M2: Jointly Modeling Inter- & Intra-Modality Dependencies for Multi-modal Learning (NeurIPS 2024)☆19Updated 5 months ago
- ☆9Updated last month
- ☆31Updated 3 months ago
- [ICLR 2025] Official Code Release for Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation☆42Updated last month
- MultiModN – Multimodal, Multi-Task, Interpretable Modular Networks (NeurIPS 2023)☆33Updated last year
- Awesome Learn From Model Beyond Fine-Tuning: A Survey☆62Updated 4 months ago
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆51Updated 9 months ago
- The official GitHub page for paper "NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional St…☆21Updated 11 months ago
- Official PyTorch implementation for NeurIPS'24 paper "Knowledge Composition using Task Vectors with Learned Anisotropic Scaling"☆19Updated last month
- [NeurIPS 2024] Official implementation of the paper "MambaLRP: Explaining Selective State Space Sequence Models".☆38Updated 5 months ago
- ☆18Updated 9 months ago