vulus98 / Rethinking-attention
My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing otherwise seemingly hard concepts. Currently included IWSLT pretrained models.
☆44Updated 2 months ago
Alternatives and similar repositories for Rethinking-attention:
Users that are interested in Rethinking-attention are comparing it to the libraries listed below
- State Space Models☆64Updated 9 months ago
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆51Updated 2 weeks ago
- [ICML 2024] Official PyTorch implementation of "SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-paramete…☆88Updated 5 months ago
- ☆45Updated 10 months ago
- ☆47Updated last week
- A repository for DenseSSMs☆86Updated 10 months ago
- A Triton Kernel for incorporating Bi-Directionality in Mamba2☆60Updated last month
- [NeurIPS2023]Lightweight Vision Transformer with Bidirectional Interaction☆23Updated last year
- ☆23Updated last year
- Simba☆200Updated 10 months ago
- Ofiicial Implementation for Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data☆56Updated 7 months ago
- ☆65Updated 3 months ago
- Code Implementation of EfficientVMamba☆194Updated 10 months ago
- Official implementation for "Knowledge Distillation with Refined Logits".☆13Updated 5 months ago
- An efficient pytorch implementation of selective scan in one file, works with both cpu and gpu, with corresponding mathematical derivatio…☆79Updated 11 months ago
- This is the official code for paper: Token Summarisation for Efficient Vision Transformers via Graph-based Token Propagation☆26Updated last year
- (NeurIPS 2023) PyTorch implementation of "Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation"☆19Updated 4 months ago
- A simpler Pytorch + Zeta Implementation of the paper: "SiMBA: Simplified Mamba-based Architecture for Vision and Multivariate Time series…☆27Updated 3 months ago
- The official implementation for MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning☆39Updated 6 months ago
- Awesome list of papers that extend Mamba to various applications.☆131Updated last month
- A simple but robust PyTorch implementation of RetNet from "Retentive Network: A Successor to Transformer for Large Language Models" (http…☆104Updated last year
- Official repository for CVPR24 Precognition Workshop Paper: VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotem…☆118Updated 10 months ago
- ☆16Updated last year
- ☆34Updated last year
- Second Generation of the MAMBA Software☆28Updated 4 months ago
- Official PyTorch Implementation of "The Hidden Attention of Mamba Models"☆211Updated 8 months ago
- Transformer model based on Kolmogorov–Arnold Network(KAN), which is an alternative of Multi-Layer Perceptron(MLP)☆27Updated 2 months ago
- Minimal Mamba-2 implementation in PyTorch☆170Updated 7 months ago
- Implementation of ViTaR: ViTAR: Vision Transformer with Any Resolution in PyTorch☆30Updated 3 months ago
- ☆55Updated 11 months ago