bwconrad / soft-moe
PyTorch implementation of "From Sparse to Soft Mixtures of Experts"
☆53Updated last year
Alternatives and similar repositories for soft-moe:
Users that are interested in soft-moe are comparing it to the libraries listed below
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆71Updated last year
- [NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆31Updated last year
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆82Updated 2 years ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆49Updated 2 years ago
- [EMNLP 2023 Main] Sparse Low-rank Adaptation of Pre-trained Language Models☆73Updated last year
- EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning (ACL 2023)☆27Updated last year
- Official implementation of AAAI 2023 paper "Parameter-efficient Model Adaptation for Vision Transformers"☆104Updated last year
- Metrics for "Beyond neural scaling laws: beating power law scaling via data pruning " (NeurIPS 2022 Outstanding Paper Award)☆56Updated last year
- ☆85Updated 2 years ago
- ☆21Updated 2 years ago
- DiWA: Diverse Weight Averaging for Out-of-Distribution Generalization☆29Updated 2 years ago
- Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.☆32Updated last year
- ☆27Updated last year
- ☆102Updated last year
- Mixture of Attention Heads☆44Updated 2 years ago
- (CVPR 2022) Automated Progressive Learning for Efficient Training of Vision Transformers☆25Updated last month
- This repository is the implementation of the paper Training Free Pretrained Model Merging (CVPR2024).☆29Updated last year
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆37Updated 6 months ago
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆53Updated 7 months ago
- A repository for DenseSSMs☆87Updated last year
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated 7 months ago
- This repo contains the source code for VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks (NeurIPS 2024).☆37Updated 5 months ago
- ImageNetV2 Pytorch Dataset☆41Updated last year
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆282Updated last week
- The official implementation for MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning (CVPR '24)☆45Updated last month
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆86Updated this week
- [CVPR 2022] "The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy" by Tianlong C…☆25Updated 3 years ago
- On the Effectiveness of Parameter-Efficient Fine-Tuning☆38Updated last year
- PyTorch implementation of LIMoE☆53Updated last year
- Inference Speed Benchmark for Learning to (Learn at Test Time): RNNs with Expressive Hidden States☆66Updated 8 months ago