lucidrains / compressive-transformer-pytorchLinks

Pytorch implementation of Compressive Transformers, from Deepmind

☆162

Alternatives and similar repositories for compressive-transformer-pytorch

Users that are interested in compressive-transformer-pytorch are comparing it to the libraries listed below

Sorting:

lucidrains / memformer
Implementation of Memformer, a Memory-augmented Transformer, in Pytorch
☆123Updated 4 years ago
lucidrains / feedback-transformer-pytorch
Implementation of Feedback Transformer in Pytorch
☆108Updated 4 years ago
lucidrains / ponder-transformer
Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper
☆81Updated 3 years ago
toshas / torch-discounted-cumsum
Fast Discounted Cumulative Sums in PyTorch
☆96Updated 4 years ago
facebookresearch / transformer-sequential
Trains Transformer model variants. Data isn't shuffled between batches.
☆143Updated 3 years ago
lucidrains / Mega-pytorch
Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena
☆206Updated 2 years ago
ischlag / fast-weight-transformers
Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.
☆105Updated 4 years ago
lucidrains / routing-transformer
Fully featured implementation of Routing Transformer
☆296Updated 3 years ago
RobertCsordas / transformer_generalization
The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We s…
☆67Updated 2 years ago
lucidrains / gated-state-spaces-pytorch
Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch
☆101Updated 2 years ago
lucidrains / g-mlp-gpt
GPT, but made only out of MLPs
☆89Updated 4 years ago
XuezheMax / fairseq-apollo
FairSeq repo with Apollo optimizer
☆114Updated last year
lucidrains / HTM-pytorch
Implementation of Hierarchical Transformer Memory (HTM) for Pytorch
☆75Updated 4 years ago
lucidrains / memory-transformer-xl
A variant of Transformer-XL where the memory is updated not with a queue, but with attention
☆49Updated 5 years ago
lucidrains / learning-to-expire-pytorch
An implementation of Transformer with Expire-Span, a circuit for learning which memories to retain
☆34Updated 4 years ago
lucidrains / long-short-transformer
Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch
☆120Updated 4 years ago
NVIDIA / transformer-ls
Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).
☆228Updated 3 years ago
facebookresearch / mega
Sequence modeling with Mega.
☆300Updated 2 years ago
kzl / universal-computation
Official codebase for Pretrained Transformers as Universal Computation Engines.
☆247Updated 3 years ago
lucidrains / mlm-pytorch
An implementation of masked language modeling for Pytorch, made as concise and simple as possible
☆179Updated 2 years ago
epfml / collaborative-attention
Code for Multi-Head Attention: Collaborate Instead of Concatenate
☆151Updated 2 years ago
lucidrains / product-key-memory
Standalone Product Key Memory module in Pytorch - for augmenting Transformer models
☆83Updated last year
lucidrains / axial-positional-embedding
Axial Positional Embedding for Pytorch
☆83Updated 8 months ago
layer6ai-labs / T-Fixup
Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"
☆89Updated 4 years ago
lucidrains / sinkhorn-transformer
Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention
☆268Updated 4 years ago
lucidrains / token-shift-gpt
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing
☆50Updated 3 years ago
LiyuanLucasLiu / Transformer-Clinic
Understanding the Difficulty of Training Transformers
☆330Updated 3 years ago
cpcp1998 / PermuteFormer
Code for the paper PermuteFormer
☆42Updated 4 years ago
sarthmit / Compositional-Attention
Code to reproduce the results for Compositional Attention
☆59Updated 2 years ago
ctlllll / SGConv
☆164Updated 2 years ago