DeadAt0m / adafactor-pytorch
A pytorch realization of adafactor (https://arxiv.org/pdf/1804.04235.pdf )
☆23Updated 5 years ago
Alternatives and similar repositories for adafactor-pytorch:
Users that are interested in adafactor-pytorch are comparing it to the libraries listed below
- Axial Positional Embedding for Pytorch☆70Updated last week
- Implementation of the retriever distillation procedure as outlined in the paper "Distilling Knowledge from Reader to Retriever"☆32Updated 4 years ago
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆49Updated 2 years ago
- Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch☆118Updated 3 years ago
- Unofficial PyTorch implementation of the paper "cosFormer: Rethinking Softmax In Attention".☆44Updated 3 years ago
- Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)☆60Updated 2 years ago
- A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering☆41Updated 4 years ago
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆37Updated 2 years ago
- Implementation of Kronecker Attention in Pytorch☆18Updated 4 years ago
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆48Updated 3 years ago
- Code for the paper PermuteFormer☆42Updated 3 years ago
- Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch☆45Updated 3 years ago
- MTAdam: Automatic Balancing of Multiple Training Loss Terms☆36Updated 4 years ago
- Implementation of "compositional attention" from MILA, a multi-head attention variant that is reframed as a two-step attention process wi…☆50Updated 2 years ago
- code for Explicit Sparse Transformer☆60Updated last year
- High performance pytorch modules☆18Updated 2 years ago
- Code for the paper "Query-Key Normalization for Transformers"☆37Updated 3 years ago
- Another attempt at a long-context / efficient transformer by me☆37Updated 2 years ago
- Implementation of Fast Transformer in Pytorch☆172Updated 3 years ago
- Implementation of the Remixer Block from the Remixer paper, in Pytorch☆35Updated 3 years ago
- Code for "Understanding and Improving Layer Normalization"☆46Updated 5 years ago
- ☆29Updated 2 years ago
- Implementation of OmniNet, Omnidirectional Representations from Transformers, in Pytorch☆57Updated 3 years ago
- This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron …☆32Updated last year
- ☆37Updated last year
- Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"☆89Updated 4 years ago
- ☆47Updated 4 years ago
- Implementation of Mogrifier LSTM in PyTorch☆35Updated 4 years ago
- A PyTorch implementation of the paper - "Synthesizer: Rethinking Self-Attention in Transformer Models"☆72Updated 2 years ago
- The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Natu…☆48Updated 3 years ago