YeonwooSung / Pytorch_mixture-of-expertsLinks

PyTorch implementation of moe, which stands for mixture of experts

☆50

Alternatives and similar repositories for Pytorch_mixture-of-experts

Users that are interested in Pytorch_mixture-of-experts are comparing it to the libraries listed below

Sorting:

fkodom / soft-mixture-of-experts
PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)
☆78Updated 2 years ago
knotgrass / attention
several types of attention modules written in PyTorch for learning purposes
☆52Updated last year
lucidrains / infini-transformer-pytorch
Implementation of Infini-Transformer in Pytorch
☆113Updated 9 months ago
lucidrains / mixture-of-attention
Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
☆119Updated last year
kyegomez / MoE-Mamba
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…
☆113Updated last week
kyegomez / SparseAttention
Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"
☆91Updated last week
krafton-ai / mambaformer-icl
MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248
☆57Updated last year
nbasyl / DoRA
Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"
☆124Updated last year
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆100Updated last year
kyegomez / MultiQueryAttention
This is a simple torch implementation of the high performance Multi-Query Attention
☆15Updated 2 years ago
WailordHe / DenseSSM
A repository for DenseSSMs
☆89Updated last year
lucidrains / tableformer-pytorch
Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch
☆39Updated 3 years ago
lucidrains / CALM-pytorch
Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind
☆177Updated last year
lucidrains / agent-attention-pytorch
Implementation of Agent Attention in Pytorch
☆91Updated last year
Montinger / Transformer-Workbench
Playground for Transformers
☆53Updated last year
tum-ai / number-token-loss
A regression-alike loss to improve numerical reasoning in language models - ICML 2025
☆26Updated 2 months ago
lucidrains / sinkhorn-router-pytorch
Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or otherwise
☆39Updated last year
lucidrains / AMIE-pytorch
Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind
☆68Updated last year
UCDvision / NOLA
Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"
☆56Updated last year
facebookresearch / ViP-MAE
This is a PyTorch implementation of the paperViP A Differentially Private Foundation Model for Computer Vision
☆36Updated 2 years ago
uclaml / MoE
Towards Understanding the Mixture-of-Experts Layer in Deep Learning
☆32Updated last year
badripatro / mamba360
State Space Models
☆70Updated last year
OscarXZQ / weight-selection
☆186Updated last year
naver-ai / model-stock
Model Stock: All we need is just a few fine-tuned models
☆125Updated 2 months ago
fangyuan-ksgk / Mini-LLaVA
A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.
☆96Updated 10 months ago
rasbt / cvpr2023
☆134Updated 2 years ago
alenic / timm-models-explorer
Timm model explorer
☆42Updated last year
CASE-Lab-UMD / LLM-Drop
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
☆179Updated 7 months ago
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆129Updated last year
The-AI-Summer / pytorch-ddp
code for the ddp tutorial
☆32Updated 3 years ago