vulus98 / Rethinking-attentionLinks

My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing otherwise seemingly hard concepts. Currently included IWSLT pretrained models.

☆44

Alternatives and similar repositories for Rethinking-attention

Users that are interested in Rethinking-attention are comparing it to the libraries listed below

Sorting:

badripatro / mamba360
State Space Models
☆71Updated last year
MambaMixer / M2
☆48Updated last year
badripatro / simba
Simba
☆214Updated last year
xinghaochen / SLAB
[ICML 2024] Official PyTorch implementation of "SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-paramete…
☆109Updated last year
Adamdad / rational_kat_cu
☆76Updated 9 months ago
MzeroMiko / mamba-mini
An efficient pytorch implementation of selective scan in one file, works with both cpu and gpu, with corresponding mathematical derivatio…
☆98Updated last month
kyegomez / MoE-Mamba
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…
☆115Updated last month
ZacharyMeng / PolaFormer
Official repository of Polarity-aware Linear Attention for Vision Transformers (ICLR 2025)
☆78Updated last month
NVlabs / ConvSSM
☆69Updated last year
WailordHe / DenseSSM
A repository for DenseSSMs
☆89Updated last year
jacklishufan / Mamba-ND
Ofiicial Implementation for Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
☆64Updated last year
kyegomez / ViTAR
Implementation of ViTaR: ViTAR: Vision Transformer with Any Resolution in PyTorch
☆38Updated last year
TerryPei / EfficientVMamba
Code Implementation of EfficientVMamba
☆237Updated last year
Caiyun-AI / DCFormer
☆220Updated 9 months ago
AmeenAli / HiddenMambaAttn
Official PyTorch Implementation of "The Hidden Attention of Mamba Models"
☆229Updated last month
pengzhangzhi / Awesome-Mamba
Awesome list of papers that extend Mamba to various applications.
☆139Updated 5 months ago
kyegomez / Griffin
Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"
☆56Updated last month
okojoalg / dfformer
☆68Updated last year
GATECH-EIC / Castling-ViT
[CVPR 2023] Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference
☆30Updated last year
scale-lab / MTLoRA
The official implementation for MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning (CVPR '24)
☆69Updated 4 months ago
microsoft / TokenMixers
☆152Updated last year
OSVAI / KernelWarehouse
The official project website of "KernelWarehouse: Rethinking the Design of Dynamic Convolution" (KW for short, published in ICML 2024)
☆101Updated last year
kyegomez / SwitchTransformers
Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficien…
☆133Updated last month
Itamarzimm / UnifiedImplicitAttnRepr
[ICLR 2025] Official Code Release for Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation
☆47Updated 8 months ago
nanowell / Differential-Transformer-PyTorch
PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model …
☆79Updated last year
kyegomez / Simba
A simpler Pytorch + Zeta Implementation of the paper: "SiMBA: Simplified Mamba-based Architecture for Vision and Multivariate Time series…
☆28Updated last year
Hprairie / Bi-Mamba2
A Triton Kernel for incorporating Bi-Directionality in Mamba2
☆75Updated 11 months ago
BGU-CS-VIL / DiTAC
Trainable Highly-expressive Activation Functions. ECCV 2024
☆38Updated 9 months ago
maclong01 / DeBiFormer
[ACCV 2024 ] Official code for "DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention"
☆31Updated 10 months ago
yyyujintang / VMRNN-PyTorch
Official repository for CVPR24 Precognition Workshop Paper: VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotem…
☆153Updated last year