lhallee / Multi_Head_Mixture_of_Experts__MH-MOE
☆19Updated last month
Related projects ⓘ
Alternatives and complementary repositories for Multi_Head_Mixture_of_Experts__MH-MOE
- PyTorch implementation of "From Sparse to Soft Mixtures of Experts"☆44Updated last year
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆67Updated this week
- My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing o…☆40Updated 11 months ago
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆102Updated 3 months ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆64Updated last year
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆34Updated 4 months ago
- Fine-tuning Vision Transformers on various classification datasets☆91Updated 2 months ago
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆48Updated 2 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆23Updated last week
- A repository for DenseSSMs☆88Updated 7 months ago
- ☆32Updated 5 months ago
- The source code of the EMNLP 2023 main conference paper: Sparse Low-rank Adaptation of Pre-trained Language Models.☆69Updated 8 months ago
- Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…☆83Updated this week
- [ICCV23] Robust Mixture-of-Expert Training for Convolutional Neural Networks by Yihua Zhang, Ruisi Cai, Tianlong Chen, Guanhua Zhang, Hua…☆42Updated last year
- Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"☆59Updated this week
- Awesome-Low-Rank-Adaptation☆33Updated 3 weeks ago
- Implementation of AAAI 2022 Paper: Go wider instead of deeper☆32Updated 2 years ago
- Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficien…☆53Updated last week
- Official PyTorch Implementation of "The Hidden Attention of Mamba Models"☆199Updated 5 months ago
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆49Updated last week
- HGRN2: Gated Linear RNNs with State Expansion☆48Updated 2 months ago
- ☆41Updated 7 months ago
- Inference Speed Benchmark for Learning to (Learn at Test Time): RNNs with Expressive Hidden States☆38Updated 3 months ago
- Awesome list of papers that extend Mamba to various applications.☆127Updated last month
- State Space Models☆62Updated 6 months ago
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆77Updated last month
- The official implementation for MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning☆32Updated 3 months ago
- PyTorch implementation of LIMoE☆51Updated 7 months ago
- [ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models☆28Updated last month
- ☆29Updated 3 weeks ago