bwconrad / soft-moeLinks
PyTorch implementation of "From Sparse to Soft Mixtures of Experts"
☆57Updated last year
Alternatives and similar repositories for soft-moe
Users that are interested in soft-moe are comparing it to the libraries listed below
Sorting:
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆73Updated last year
- Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.☆33Updated last year
- ☆87Updated 2 years ago
- [CVPR 2022] "The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy" by Tianlong C…☆25Updated 3 years ago
- Distributed Optimization Infra for learning CLIP models☆26Updated 8 months ago
- [NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆30Updated 2 years ago
- This repository is the implementation of the paper Training Free Pretrained Model Merging (CVPR2024).☆28Updated last year
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆56Updated 5 months ago
- The official implementation for MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning (CVPR '24)☆49Updated 2 months ago
- A repository for DenseSSMs☆87Updated last year
- Official implementation of AAAI 2023 paper "Parameter-efficient Model Adaptation for Vision Transformers"☆105Updated last year
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆85Updated 2 years ago
- Code release for Deep Incubation (https://arxiv.org/abs/2212.04129)☆90Updated 2 years ago
- (CVPR 2022) Automated Progressive Learning for Efficient Training of Vision Transformers☆25Updated 3 months ago
- PyTorch implementation of LIMoE☆53Updated last year
- [ICLR2025] This repository is the official implementation of our Autoregressive Pretraining with Mamba in Vision☆77Updated this week
- Official code for our paper, "LoRA-Pro: Are Low-Rank Adapters Properly Optimized? "☆117Updated last month
- ☆21Updated 2 years ago
- Repository containing code for blockwise SSL training☆29Updated 7 months ago
- EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning (ACL 2023)☆29Updated last year
- On the Effectiveness of Parameter-Efficient Fine-Tuning☆38Updated last year
- The official implementation of "2024NeurIPS Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation"☆46Updated 5 months ago
- Mixture of Attention Heads☆44Updated 2 years ago
- Metrics for "Beyond neural scaling laws: beating power law scaling via data pruning " (NeurIPS 2022 Outstanding Paper Award)☆56Updated 2 years ago
- [ICLR2024] Exploring Target Representations for Masked Autoencoders☆55Updated last year
- ☆50Updated 4 months ago
- ☆109Updated last year
- ☆27Updated last year
- LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters☆35Updated 2 months ago
- A curated list of Model Merging methods.☆92Updated 8 months ago