lhallee / Multi_Head_Mixture_of_Experts__MH-MOELinks
☆28Updated 7 months ago
Alternatives and similar repositories for Multi_Head_Mixture_of_Experts__MH-MOE
Users that are interested in Multi_Head_Mixture_of_Experts__MH-MOE are comparing it to the libraries listed below
Sorting:
- A repository for DenseSSMs☆87Updated last year
- Official implementation of NeurIPS 2024 "Visual Fourier Prompt Tuning"☆28Updated 4 months ago
- The official implementation for MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning (CVPR '24)☆49Updated 2 months ago
- PyTorch implementation of "From Sparse to Soft Mixtures of Experts"☆57Updated last year
- Official code for our paper, "LoRA-Pro: Are Low-Rank Adapters Properly Optimized? "☆117Updated last month
- ☆37Updated 10 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆38Updated 7 months ago
- LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters☆35Updated 2 months ago
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆55Updated 9 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆94Updated this week
- My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing o…☆43Updated 5 months ago
- [EMNLP 2023, Main Conference] Sparse Low-rank Adaptation of Pre-trained Language Models☆76Updated last year
- ☆48Updated last year
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆56Updated 5 months ago
- [NAACL 2025] MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning☆16Updated this week
- [ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models☆29Updated 8 months ago
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆35Updated 11 months ago
- ☆14Updated 8 months ago
- ☆23Updated last year
- The official implementation of "2024NeurIPS Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation"☆46Updated 5 months ago
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆55Updated 2 months ago
- Official implementation for "Knowledge Distillation with Refined Logits".☆14Updated 9 months ago
- Implementation of AAAI 2022 Paper: Go wider instead of deeper☆32Updated 2 years ago
- [ICCV23] Robust Mixture-of-Expert Training for Convolutional Neural Networks by Yihua Zhang, Ruisi Cai, Tianlong Chen, Guanhua Zhang, Hua…☆56Updated last year
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆73Updated last year
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆28Updated last month
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆97Updated 8 months ago
- ☆15Updated 7 months ago
- Adapting LLaMA Decoder to Vision Transformer☆28Updated last year
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆30Updated 11 months ago