uclaml / MoELinks
Towards Understanding the Mixture-of-Experts Layer in Deep Learning
☆31Updated last year
Alternatives and similar repositories for MoE
Users that are interested in MoE are comparing it to the libraries listed below
Sorting:
- HGRN2: Gated Linear RNNs with State Expansion☆55Updated 10 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆29Updated 7 months ago
- PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆37Updated 7 months ago
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆40Updated 8 months ago
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆57Updated 6 months ago
- ☆19Updated 3 months ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆73Updated last year
- ☆32Updated 5 months ago
- [WACV 2025] Official implementation of "Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation" by Xiwen Wei, Guihong L…☆41Updated 7 months ago
- ☆20Updated 11 months ago
- Official Code for ICLR 2024 Paper: Non-negative Contrastive Learning☆45Updated last year
- ☆29Updated 4 months ago
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [to appear at ICLR 2025]☆19Updated last month
- [NeurIPS'24] Official PyTorch implementation for paper "Knowledge Composition using Task Vectors with Learned Anisotropic Scaling"☆20Updated 4 months ago
- Official Code for Paper: Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation☆80Updated 2 weeks ago
- Recycling diverse models☆44Updated 2 years ago
- ☆31Updated 8 months ago
- Official PyTorch implementation of DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs (ICML 2025 Oral)☆22Updated 2 weeks ago
- Awesome Learn From Model Beyond Fine-Tuning: A Survey☆67Updated 6 months ago
- Model Merging with SVD to Tie the KnOTS [ICLR 2025]☆57Updated 2 months ago
- Unofficial Implementation of Selective Attention Transformer☆17Updated 7 months ago
- ☆31Updated last year
- ☆43Updated 4 months ago
- Official code for the paper "Attention as a Hypernetwork"☆39Updated last year
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆55Updated last year
- Code for GFlowNet-EM, a novel algorithm for fitting latent variable models with compositional latents and an intractable true posterior.☆40Updated last year
- ☆33Updated last year
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆80Updated last year
- Data Valuation without Training of a Model, submitted to ICLR'23☆22Updated 2 years ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆66Updated 9 months ago