openai / sparse_attention
Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"
☆1,562Updated 4 years ago
Alternatives and similar repositories for sparse_attention:
Users that are interested in sparse_attention are comparing it to the libraries listed below
- Reformer, the efficient Transformer, in Pytorch☆2,155Updated last year
- Pytorch library for fast transformer implementations☆1,687Updated 2 years ago
- ☆3,639Updated 2 years ago
- Transformer training code for sequential tasks☆610Updated 3 years ago
- An implementation of Performer, a linear attention-based transformer, in Pytorch☆1,116Updated 3 years ago
- On the Variance of the Adaptive Learning Rate and Beyond☆2,546Updated 3 years ago
- PyTorch implementation of "Efficient Neural Architecture Search via Parameters Sharing"☆2,711Updated last year
- Make huge neural nets fit in memory☆2,773Updated 4 years ago
- Differentiable architecture search for convolutional and recurrent networks☆3,945Updated 4 years ago
- Single Headed Attention RNN - "Stop thinking with your head"☆1,181Updated 3 years ago
- Longformer: The Long-Document Transformer☆2,094Updated 2 years ago
- [ICLR 2020] Lite Transformer with Long-Short Range Attention☆606Updated 8 months ago
- My take on a practical implementation of Linformer for Pytorch.☆413Updated 2 years ago
- 🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI☆1,508Updated 3 years ago
- TensorFlow Code for paper "Efficient Neural Architecture Search via Parameter Sharing"☆1,579Updated 5 years ago
- PyTorch original implementation of Cross-lingual Language Model Pretraining.☆2,903Updated 2 years ago
- Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CAS…☆745Updated 2 years ago
- Long Range Arena for Benchmarking Efficient Transformers☆748Updated last year
- Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755)☆2,100Updated 3 years ago
- Fast, general, and tested differentiable structured prediction in PyTorch☆1,112Updated 2 years ago
- Lingvo☆2,833Updated last week
- Transformer based on a variant of attention that is linear complexity in respect to sequence length☆751Updated 10 months ago
- higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual tr…☆1,610Updated 2 years ago
- list of efficient attention modules☆996Updated 3 years ago
- Implementation of https://arxiv.org/abs/1904.00962☆372Updated 4 years ago
- Code and model for the paper "Improving Language Understanding by Generative Pre-Training"☆2,196Updated 6 years ago
- Library for faster pinned CPU <-> GPU transfer in Pytorch☆685Updated 5 years ago
- Mesh TensorFlow: Model Parallelism Made Easier☆1,601Updated last year
- Compare GAN code.☆1,821Updated 4 years ago
- Phrase-Based & Neural Unsupervised Machine Translation☆1,502Updated 3 years ago