kyegomez / SwitchTransformersLinks

Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"

☆126

Alternatives and similar repositories for SwitchTransformers

Users that are interested in SwitchTransformers are comparing it to the libraries listed below

Sorting:

nanowell / Differential-Transformer-PyTorch
PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model …
☆78Updated last year
withinmiaov / A-Survey-on-Mixture-of-Experts-in-LLMs
[TKDE'25] The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".
☆440Updated 3 months ago
tommyip / mamba2-minimal
Minimal Mamba-2 implementation in PyTorch
☆226Updated last year
Chaos96 / fourierft
☆147Updated last year
transformer-vq / transformer_vq
☆197Updated last year
kyegomez / Mixture-of-Depths
Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆108Updated last week
AmeenAli / HiddenMambaAttn
Official PyTorch Implementation of "The Hidden Attention of Mamba Models"
☆228Updated last week
fkodom / grouped-query-attention-pytorch
(Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from …
☆181Updated last year
hkproj / mamba-notes
Notes on the Mamba and the S4 model (Mamba: Linear-Time Sequence Modeling with Selective State Spaces)
☆172Updated last year
kyegomez / Griffin
Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"
☆56Updated this week
pengzhangzhi / Awesome-Mamba
Awesome list of papers that extend Mamba to various applications.
☆138Updated 4 months ago
tensorgi / TPA
[NeurIPS 2025 Spotlight] TPA: Tensor ProducT ATTenTion Transformer (T6) (https://arxiv.org/abs/2501.06425)
☆401Updated this week
zugexiaodui / torch_flops
A library for calculating the FLOPs in the forward() process based on torch.fx
☆129Updated 6 months ago
kyegomez / SparseAttention
Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"
☆91Updated last week
WailordHe / DenseSSM
A repository for DenseSSMs
☆89Updated last year
Adamdad / rational_kat_cu
☆75Updated 8 months ago
kyegomez / MoE-Mamba
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…
☆112Updated last week
Caiyun-AI / MUDDFormer
☆86Updated 5 months ago
Hprairie / Bi-Mamba2
A Triton Kernel for incorporating Bi-Directionality in Mamba2
☆75Updated 10 months ago
MzeroMiko / mamba-mini
An efficient pytorch implementation of selective scan in one file, works with both cpu and gpu, with corresponding mathematical derivatio…
☆95Updated 2 weeks ago
RobinWu218 / ToST
[ICLR 2025 Spotlight] Official Implementation for ToST (Token Statistics Transformer)
☆123Updated 8 months ago
BICLab / MetaLA
Offical implementation of "MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map" (NeurIPS2024 Oral)
☆30Updated 9 months ago
badripatro / mamba360
State Space Models
☆70Updated last year
OpenNLPLab / lightning-attention
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
☆330Updated 8 months ago
badripatro / simba
Simba
☆214Updated last year
dhcode-cpp / NSA-pytorch
DeepSeek Native Sparse Attention pytorch implementation
☆106Updated 2 weeks ago
NVlabs / GatedDeltaNet
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆331Updated last month
LINs-lab / DynMoE
[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
☆137Updated 3 months ago
radarFudan / Awesome-state-space-models
Collection of papers on state-space models
☆602Updated last month
MuLabPKU / TransMLA
TransMLA: Multi-Head Latent Attention Is All You Need (NeurIPS 2025 Spotlight)
☆393Updated last month