eamartin / parallelizing_linear_rnns
☆41Updated 6 years ago
Related projects: ⓘ
- ☆42Updated 7 months ago
- Parallel Associative Scan for Language Models☆16Updated 8 months ago
- ☆30Updated 8 months ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆60Updated 4 months ago
- The accompanying code for "Simplifying and Understanding State Space Models with Diagonal Linear RNNs" (Ankit Gupta, Harsh Mehta, Jonatha…☆19Updated last year
- ☆48Updated 4 months ago
- ☆42Updated 3 months ago
- Blog post☆16Updated 7 months ago
- ☆33Updated 8 months ago
- ☆41Updated 2 months ago
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆34Updated 9 months ago
- Parallelizing non-linear sequential models over the sequence length☆40Updated last month
- Sparse Backpropagation for Mixture-of-Expert Training☆17Updated 2 months ago
- Implementation of GateLoop Transformer in Pytorch and Jax☆86Updated 3 months ago
- ☆34Updated this week
- Using FlexAttention to compute attention with different masking patterns☆28Updated last week
- Efficient PScan implementation in PyTorch☆15Updated 8 months ago
- The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We s…☆66Updated last year
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆50Updated 10 months ago
- RWKV model implementation☆38Updated last year
- Here we will test various linear attention designs.☆55Updated 4 months ago
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆94Updated last year
- A PyTorch wrapper of parallel exclusive scan in CUDA☆8Updated last year
- The official Languini Kitchen repository☆14Updated 4 months ago
- Accelerated First Order Parallel Associative Scan☆151Updated last month
- ☆28Updated last week
- ☆66Updated 3 months ago
- Structured matrices for compressing neural networks☆65Updated 11 months ago
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023☆123Updated 4 months ago
- Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)☆58Updated 2 years ago