lhallee / Multi_Head_Mixture_of_Experts__MH-MOELinks
☆28Updated 8 months ago
Alternatives and similar repositories for Multi_Head_Mixture_of_Experts__MH-MOE
Users that are interested in Multi_Head_Mixture_of_Experts__MH-MOE are comparing it to the libraries listed below
Sorting:
- PyTorch implementation of "From Sparse to Soft Mixtures of Experts"☆57Updated last year
- A repository for DenseSSMs☆87Updated last year
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆95Updated 2 weeks ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆38Updated 8 months ago
- My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing o…☆43Updated 6 months ago
- Implementation of AAAI 2022 Paper: Go wider instead of deeper☆32Updated 2 years ago
- [ICLR 2025] Official Code Release for Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation☆42Updated 3 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆24Updated 3 weeks ago
- The official implementation for MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning (CVPR '24)☆52Updated 3 months ago
- Official implementation of NeurIPS 2024 "Visual Fourier Prompt Tuning"☆28Updated 5 months ago
- a training-free approach to accelerate ViTs and VLMs by pruning redundant tokens based on similarity☆28Updated last month
- [NAACL 2025] MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning☆17Updated 3 weeks ago
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆28Updated 2 months ago
- ☆38Updated 11 months ago
- State Space Models☆67Updated last year
- Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficien…☆108Updated 2 months ago
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆55Updated last year
- ☆21Updated last year
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆35Updated last year
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆65Updated last year
- ☆63Updated 4 months ago
- ☆47Updated last year
- Adapting LLaMA Decoder to Vision Transformer☆28Updated last year
- More dimensions = More fun☆22Updated 11 months ago
- Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation. NeurIPS 2022.☆32Updated 2 years ago
- ☆55Updated 6 months ago
- [EMNLP 2023, Main Conference] Sparse Low-rank Adaptation of Pre-trained Language Models☆78Updated last year
- Triton implement of bi-directional (non-causal) linear attention☆50Updated 4 months ago
- ☆14Updated 2 weeks ago
- This repo contains the source code for VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks (NeurIPS 2024).☆38Updated 8 months ago