pkuzengqi / SkyformerLinks

Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)

☆62

Alternatives and similar repositories for Skyformer

Users that are interested in Skyformer are comparing it to the libraries listed below

Sorting:

lucidrains / memformer
Implementation of Memformer, a Memory-augmented Transformer, in Pytorch
☆119Updated 4 years ago
XuezheMax / fairseq-apollo
FairSeq repo with Apollo optimizer
☆114Updated last year
lsj2408 / URPE
[NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)
☆34Updated 2 years ago
schwartz-lab-NLP / papa
Code for the PAPA paper
☆27Updated 2 years ago
cheneydon / efficient-bert
This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron …
☆33Updated 2 years ago
lucidrains / learning-to-expire-pytorch
An implementation of Transformer with Expire-Span, a circuit for learning which memories to retain
☆34Updated 4 years ago
lucidrains / memory-transformer-xl
A variant of Transformer-XL where the memory is updated not with a queue, but with attention
☆49Updated 5 years ago
lucidrains / coco-lm-pytorch
Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch
☆46Updated 4 years ago
lucidrains / rela-transformer
Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012
☆49Updated 3 years ago
lucidrains / long-short-transformer
Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch
☆119Updated 4 years ago
PiotrNawrot / dynamic-pooling
Efficient Transformers with Dynamic Token Pooling
☆62Updated 2 years ago
dguo98 / DiffPruning
Parameter Efficient Transfer Learning with Diff Pruning
☆74Updated 4 years ago
Noahs-ARK / RFA
☆33Updated 4 years ago
lucidrains / n-grammer-pytorch
Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch
☆76Updated 2 years ago
antofuller / configaformers
A python library for highly configurable transformers - easing model architecture search and experimentation.
☆49Updated 3 years ago
cliang1453 / SAGE
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models (ICLR 2022)
☆30Updated 3 years ago
RobertCsordas / ndr
The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".
☆33Updated last month
lucidrains / gated-state-spaces-pytorch
Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch
☆101Updated 2 years ago
lucidrains / token-shift-gpt
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing
☆50Updated 3 years ago
CyndxAI / QKNorm
Code for the paper "Query-Key Normalization for Transformers"
☆45Updated 4 years ago
lucidrains / multistream-transformers
Implementation of Multistream Transformers in Pytorch
☆54Updated 4 years ago
RobertCsordas / linear_layer_as_attention
The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …
☆16Updated last month
lucidrains / memory-editable-transformer
My explorations into editing the knowledge and memories of an attention network
☆35Updated 2 years ago
RobertCsordas / transformer_generalization
The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We s…
☆67Updated 2 years ago
rabeehk / compacter
☆129Updated 2 years ago
NVIDIA / transformer-ls
Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).
☆225Updated 3 years ago
lucidrains / memory-compressed-attention
Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"
☆70Updated 2 years ago
JetRunner / PABEE
Code for the paper "BERT Loses Patience: Fast and Robust Inference with Early Exit".
☆65Updated 4 years ago
10-zin / Synthesizer
A PyTorch implementation of the paper - "Synthesizer: Rethinking Self-Attention in Transformer Models"
☆73Updated 2 years ago
microsoft / EfficientLongSequenceModeling
☆51Updated 2 years ago