piotrpiekos / MoSA
User-friendly implementation of the Mixture-of-Sparse-Attention (MoSA). MoSA selects distinct tokens for each head with expert choice routing providing a content-based sparse attention mechanism.
☆10Updated last week
Alternatives and similar repositories for MoSA:
Users that are interested in MoSA are comparing it to the libraries listed below
- ☆32Updated last year
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated 8 months ago
- This is a simple torch implementation of the high performance Multi-Query Attention☆17Updated last year
- Code for paper: "LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits"☆13Updated 7 months ago
- ☆16Updated 9 months ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆64Updated last year
- Here we will test various linear attention designs.☆60Updated last year
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆26Updated 6 months ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Updated last year
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆30Updated 2 weeks ago