lucidrains / memformerLinks

Implementation of Memformer, a Memory-augmented Transformer, in Pytorch

☆119

Alternatives and similar repositories for memformer

Users that are interested in memformer are comparing it to the libraries listed below

Sorting:

lucidrains / memory-transformer-xl
A variant of Transformer-XL where the memory is updated not with a queue, but with attention
☆49Updated 5 years ago
lucidrains / compressive-transformer-pytorch
Pytorch implementation of Compressive Transformers, from Deepmind
☆162Updated 3 years ago
lucidrains / gated-state-spaces-pytorch
Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch
☆101Updated 2 years ago
pkuzengqi / Skyformer
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)
☆62Updated 3 years ago
lucidrains / product-key-memory
Standalone Product Key Memory module in Pytorch - for augmenting Transformer models
☆82Updated last year
lucidrains / long-short-transformer
Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch
☆119Updated 4 years ago
XuezheMax / fairseq-apollo
FairSeq repo with Apollo optimizer
☆114Updated last year
NVIDIA / transformer-ls
Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).
☆225Updated 3 years ago
lsj2408 / URPE
[NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)
☆34Updated 2 years ago
lucidrains / Mega-pytorch
Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena
☆204Updated last year
lucidrains / ponder-transformer
Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper
☆81Updated 3 years ago
lucidrains / memory-compressed-attention
Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"
☆70Updated 2 years ago
rabeehk / compacter
☆129Updated 2 years ago
Noahs-ARK / RFA
☆33Updated 4 years ago
lucidrains / axial-positional-embedding
Axial Positional Embedding for Pytorch
☆83Updated 5 months ago
RobertCsordas / ndr
The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".
☆33Updated last month
ischlag / fast-weight-transformers
Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.
☆105Updated 4 years ago
lucidrains / learning-to-expire-pytorch
An implementation of Transformer with Expire-Span, a circuit for learning which memories to retain
☆34Updated 4 years ago
OpenNLPLab / Tnn
[ICLR 2023] Official implementation of Transnormer in our ICLR 2023 paper - Toeplitz Neural Network for Sequence Modeling
☆79Updated last year
RobertCsordas / transformer_generalization
The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We s…
☆67Updated 2 years ago
PiotrNawrot / dynamic-pooling
Efficient Transformers with Dynamic Token Pooling
☆62Updated 2 years ago
google-research / head2toe
☆81Updated last year
twistedcubic / attention-rank-collapse
[ICML 2021 Oral] We show pure attention suffers rank collapse, and how different mechanisms combat it.
☆165Updated 4 years ago
lucidrains / feedback-transformer-pytorch
Implementation of Feedback Transformer in Pytorch
☆107Updated 4 years ago
deep-spin / infinite-former
☆65Updated 11 months ago
google-deepmind / emergent_in_context_learning
☆84Updated last year
lucidrains / token-shift-gpt
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing
☆50Updated 3 years ago
haoliuhl / language-quantized-autoencoders
Language Quantized AutoEncoders
☆108Updated 2 years ago
facebookresearch / mega
Sequence modeling with Mega.
☆297Updated 2 years ago
lucidrains / g-mlp-gpt
GPT, but made only out of MLPs
☆89Updated 4 years ago