uclaml / MoELinks
Towards Understanding the Mixture-of-Experts Layer in Deep Learning
☆34Updated 2 years ago
Alternatives and similar repositories for MoE
Users that are interested in MoE are comparing it to the libraries listed below
Sorting:
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆81Updated 2 years ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆31Updated last year
- ☆50Updated 11 months ago
- Data Valuation without Training of a Model, submitted to ICLR'23☆22Updated 3 years ago
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆61Updated last year
- Official repo of Progressive Data Expansion: data, code and evaluation☆29Updated 2 years ago
- State Space Models☆71Updated last year
- MultiModN – Multimodal, Multi-Task, Interpretable Modular Networks (NeurIPS 2023)☆35Updated 2 years ago
- Model Stock: All we need is just a few fine-tuned models☆128Updated 4 months ago
- Official implementation of ORCA proposed in the paper "Cross-Modal Fine-Tuning: Align then Refine"☆74Updated last year
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆45Updated 2 months ago
- ☆34Updated 10 months ago
- Unofficial Implementation of Selective Attention Transformer☆20Updated last year
- HGRN2: Gated Linear RNNs with State Expansion☆56Updated last year
- ☆33Updated 11 months ago
- Optimal Transport in the Big Data Era☆114Updated last year
- ☆203Updated last year
- Decomposing and Editing Predictions by Modeling Model Computation☆139Updated last year
- A regression-alike loss to improve numerical reasoning in language models - ICML 2025☆27Updated 4 months ago
- ☆152Updated last year
- Bayesian Low-Rank Adaptation for Large Language Models☆36Updated last year
- Recycling diverse models☆46Updated 2 years ago
- The official GitHub page for paper "NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional St…☆24Updated last year
- A curated list of Model Merging methods.☆94Updated 3 weeks ago
- Repository for research works and resources related to model reprogramming <https://arxiv.org/abs/2202.10629>☆64Updated 3 months ago
- Official Code for Paper: Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation☆132Updated 6 months ago
- Awesome list of papers that extend Mamba to various applications.☆139Updated 6 months ago
- Implementation of Infini-Transformer in Pytorch☆113Updated 11 months ago
- Official Code for ICLR 2024 Paper: Non-negative Contrastive Learning☆46Updated last year
- ☆145Updated last year