knotgrass / attention
several types of attention modules written in PyTorch for learning purposes
☆40Updated last month
Related projects ⓘ
Alternatives and complementary repositories for attention
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆92Updated last month
- Utilities for Training Very Large Models☆56Updated last month
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆19Updated 5 months ago
- Playground for Transformers☆42Updated 11 months ago
- code for the ddp tutorial☆32Updated 2 years ago
- ☆43Updated 11 months ago
- ☆18Updated last week
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆33Updated 5 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆49Updated 3 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆71Updated this week
- ☆35Updated 9 months ago
- Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"☆60Updated last week
- ☆98Updated 8 months ago
- (Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from …☆133Updated 6 months ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆19Updated 8 months ago
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆66Updated 5 months ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆66Updated last year
- This is a simple torch implementation of the high performance Multi-Query Attention☆15Updated last year
- Implementation of Infini-Transformer in Pytorch☆104Updated last month
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆52Updated last week
- The Efficiency Spectrum of LLM☆52Updated 11 months ago
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆49Updated last week
- ☆45Updated 2 months ago
- A repository for DenseSSMs☆88Updated 7 months ago
- Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs☆74Updated 5 months ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆43Updated last year
- Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…☆84Updated last week
- PyTorch implementation of moe, which stands for mixture of experts☆32Updated 3 years ago
- Implementation of Agent Attention in Pytorch☆86Updated 4 months ago
- ☆35Updated 7 months ago