bwconrad / soft-moe
PyTorch implementation of "From Sparse to Soft Mixtures of Experts"
☆50Updated last year
Alternatives and similar repositories for soft-moe:
Users that are interested in soft-moe are comparing it to the libraries listed below
- EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning (ACL 2023)☆24Updated last year
- [NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆29Updated last year
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆69Updated last year
- Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.☆32Updated last year
- This repository is the implementation of the paper Training Free Pretrained Model Merging (CVPR2024).☆27Updated 10 months ago
- A repository for DenseSSMs☆86Updated 9 months ago
- Metrics for "Beyond neural scaling laws: beating power law scaling via data pruning " (NeurIPS 2022 Outstanding Paper Award)☆55Updated last year
- [CVPR 2022] "The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy" by Tianlong C…☆25Updated 2 years ago
- ImageNetV2 Pytorch Dataset☆39Updated last year
- [EMNLP 2023 Main] Sparse Low-rank Adaptation of Pre-trained Language Models☆70Updated 10 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆48Updated last year
- Official Code for ICLR 2024 Paper: Non-negative Contrastive Learning☆38Updated 9 months ago
- ☆98Updated 10 months ago
- [ICCV 2023] ViLLA: Fine-grained vision-language representation learning from real-world data☆39Updated last year
- ☆77Updated last year
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆50Updated 5 months ago
- Official repo of Progressive Data Expansion: data, code and evaluation☆27Updated last year
- PyTorch implementation of LIMoE☆53Updated 9 months ago
- [NeurIPS 2023] Code release for "Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity"☆17Updated last year
- Code accompanying the paper "Massive Activations in Large Language Models"☆138Updated 10 months ago
- This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strat…☆76Updated 10 months ago
- [NeurIPS'24] Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization☆29Updated 4 months ago
- ☆21Updated 2 years ago
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆80Updated last year
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆35Updated 7 months ago
- Official implementation of AAAI 2023 paper "Parameter-efficient Model Adaptation for Vision Transformers"☆98Updated last year
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆73Updated 2 months ago
- ☆37Updated 11 months ago
- Mixture of Attention Heads☆41Updated 2 years ago
- On the Effectiveness of Parameter-Efficient Fine-Tuning☆38Updated last year