uclaml / MoE
Towards Understanding the Mixture-of-Experts Layer in Deep Learning
☆28Updated last year
Alternatives and similar repositories for MoE:
Users that are interested in MoE are comparing it to the libraries listed below
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated 8 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆25Updated 6 months ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆71Updated last year
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆54Updated 4 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆37Updated 6 months ago
- Official Code for ICLR 2024 Paper: Non-negative Contrastive Learning☆45Updated last year
- MultiModN – Multimodal, Multi-Task, Interpretable Modular Networks (NeurIPS 2023)☆33Updated last year
- ☆18Updated 9 months ago
- Model Merging with SVD to Tie the KnOTS [ICLR 2025]☆52Updated last month
- Unofficial implementation of Conformal Language Modeling by Quach et al☆28Updated last year
- Recycling diverse models☆44Updated 2 years ago
- ☆18Updated last month
- Code for paper "Merging Multi-Task Models via Weight-Ensembling Mixture of Experts"☆24Updated 10 months ago
- ☆25Updated last year
- [WACV 2025] Official implementation of "Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation" by Xiwen Wei, Guihong L…☆35Updated 5 months ago
- Official PyTorch implementation for NeurIPS'24 paper "Knowledge Composition using Task Vectors with Learned Anisotropic Scaling"☆19Updated 2 months ago
- PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆35Updated 6 months ago
- Official repo of Progressive Data Expansion: data, code and evaluation☆28Updated last year
- Official implementation of ORCA proposed in the paper "Cross-Modal Fine-Tuning: Align then Refine"☆71Updated last year
- Code for "Merging Text Transformers from Different Initializations"☆20Updated 3 months ago
- ☆31Updated 3 months ago
- [ICLR 2025] Official Code Release for Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation☆42Updated 2 months ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆66Updated 7 months ago
- Code for "Surgical Fine-Tuning Improves Adaptation to Distribution Shifts" published at ICLR 2023☆29Updated last year
- A repository for DenseSSMs☆87Updated last year
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆26Updated 3 weeks ago
- ☆28Updated 2 months ago
- ☆24Updated 3 weeks ago
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆54Updated 8 months ago
- [ACL 2023] Code for paper “Tailoring Instructions to Student’s Learning Levels Boosts Knowledge Distillation”(https://arxiv.org/abs/2305.…☆38Updated last year