lhallee / Multi_Head_Mixture_of_Experts__MH-MOELinks

☆28

Alternatives and similar repositories for Multi_Head_Mixture_of_Experts__MH-MOE

Users that are interested in Multi_Head_Mixture_of_Experts__MH-MOE are comparing it to the libraries listed below

Sorting:

kyegomez / Mixture-of-Depths
Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆99Updated last week
WailordHe / DenseSSM
A repository for DenseSSMs
☆87Updated last year
bwconrad / soft-moe
PyTorch implementation of "From Sparse to Soft Mixtures of Experts"
☆58Updated last year
sramshetty / mixture-of-depths
An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆35Updated last year
scale-lab / MTLoRA
The official implementation for MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning (CVPR '24)
☆55Updated 2 weeks ago
AmeenAli / HiddenMambaAttn
Official PyTorch Implementation of "The Hidden Attention of Mamba Models"
☆224Updated last year
Chaos96 / fourierft
☆147Updated 10 months ago
badripatro / mamba360
State Space Models
☆68Updated last year
RobertCsordas / switchhead
☆15Updated last month
sufenlp / MiLoRA
[NAACL 2025] MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning
☆17Updated last month
MambaMixer / M2
☆47Updated last year
ArmenJeddi / saint
a training-free approach to accelerate ViTs and VLMs by pruning redundant tokens based on similarity
☆29Updated last month
mrflogs / LoRA-Pro
Official code for our paper, "LoRA-Pro: Are Low-Rank Adapters Properly Optimized? "
☆126Updated 3 months ago
kyegomez / MoE-Mamba
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…
☆107Updated 3 months ago
htqin / IR-QLoRA
[ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…
☆65Updated last year
SJTU-DeepVisionLab / FLoRA
☆39Updated 11 months ago
Weixin-Liang / Mixture-of-Mamba
☆43Updated 5 months ago
vulus98 / Rethinking-attention
My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing o…
☆43Updated 7 months ago
GATECH-EIC / Castling-ViT
[CVPR 2023] Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference
☆30Updated last year
kyegomez / ViTAR
Implementation of ViTaR: ViTAR: Vision Transformer with Any Resolution in PyTorch
☆37Updated 8 months ago
pengzhangzhi / Awesome-Mamba
Awesome list of papers that extend Mamba to various applications.
☆134Updated last month
runtsang / VFPT
Official implementation of NeurIPS 2024 "Visual Fourier Prompt Tuning"
☆31Updated 6 months ago
TsinghuaC3I / SoRA
[EMNLP 2023, Main Conference] Sparse Low-rank Adaptation of Pre-trained Language Models
☆79Updated last year
pprp / Pruner-Zero
[ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
☆89Updated 7 months ago
Adamdad / rational_kat_cu
☆69Updated 5 months ago
pprp / Vision-Mamba-CIFAR10
☆23Updated last year
csarron / PuMer
[ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models
☆32Updated 9 months ago
UCDvision / NOLA
Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"
☆56Updated 10 months ago
shawnricecake / search-llm
[NeurIPS 2024] Search for Efficient LLMs
☆14Updated 6 months ago
kyegomez / TTL
Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"
☆25Updated 2 weeks ago