kyegomez / SwitchTransformers
Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"
☆44Updated last week
Related projects: ⓘ
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆48Updated last week
- Inference Speed Benchmark for Learning to (Learn at Test Time): RNNs with Expressive Hidden States☆33Updated 2 months ago
- A repository for DenseSSMs☆86Updated 5 months ago
- My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing o…☆36Updated 9 months ago
- State Space Models☆55Updated 4 months ago
- ☆119Updated last week
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆56Updated this week
- The official Pytorch implementation of the paper "Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT …☆28Updated 6 months ago
- Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"☆52Updated last week
- Minimal Mamba-2 implementation in PyTorch☆89Updated 3 months ago
- ☆170Updated 9 months ago
- A Triton Kernel for incorporating Bi-Directionality in Mamba2☆43Updated last week
- Awesome list of papers that extend Mamba to various applications.☆124Updated 3 weeks ago
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆94Updated last month
- Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…☆73Updated last week
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆55Updated 5 months ago
- (Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from …☆116Updated 4 months ago
- [EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer☆53Updated last year
- PyTorch implementation of "From Sparse to Soft Mixtures of Experts"☆38Updated last year
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆33Updated 3 months ago
- ☆16Updated last year
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆22Updated 3 months ago
- ☆19Updated 4 months ago
- Implementation of AAAI 2022 Paper: Go wider instead of deeper☆32Updated last year
- ☆54Updated 2 months ago
- ☆97Updated last month
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆78Updated last year
- Pytorch implementation of Simplified Structured State-Spaces for Sequence Modeling (S5)☆58Updated 4 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆22Updated this week
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆42Updated last year