santient / sparse-transformer

Sparse Transformer with limited attention span in PyTorch

☆12

Alternatives and similar repositories for sparse-transformer:

Users that are interested in sparse-transformer are comparing it to the libraries listed below

davidsvy / cosformer-pytorch
Unofficial PyTorch implementation of the paper "cosFormer: Rethinking Softmax In Attention".
☆44Updated 3 years ago
rishikksh20 / rectified-linear-attention
Sparse Attention with Linear Units
☆17Updated 3 years ago
CupidJay / MoCov3-pytorch
custom pytorch implementation of MoCo v3
☆45Updated 3 years ago
fawazsammani / mogrifier-lstm-pytorch
Implementation of Mogrifier LSTM in PyTorch
☆35Updated 4 years ago
bellymonster / Weighted-Soft-Label-Distillation
☆57Updated 3 years ago
calclavia / Performer-Pytorch
Pytorch implementation of Performer from the paper "Rethinking Attention with Performers".
☆24Updated 4 years ago
kuixu / Linear-Multihead-Attention
Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)
☆73Updated 4 years ago
LibertFan / MAN
Mask Attention Networks: Rethinking and Strengthen Transformer in NAACL2021
☆14Updated 3 years ago
jaketae / g-mlp
PyTorch implementation of Pay Attention to MLPs
☆40Updated 3 years ago
lancopku / Explicit-Sparse-Transformer
code for Explicit Sparse Transformer
☆60Updated last year
VITA-Group / layerGraftedPretraining_ICLR23
[ICLR 2023] “ Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Better Representations”, Ziyu Jian…
☆23Updated 2 years ago
Chenglin-Yang / LESA
Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms
☆20Updated 3 years ago
leaderj1001 / Synthesizer-Rethinking-Self-Attention-Transformer-Models
Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch
☆70Updated 4 years ago
DequanWang / weightnet.pytorch
WeightNet: Revisiting the Design Space of Weight Networks
☆19Updated 4 years ago
fundamentalvision / Siamese-Image-Modeling
☆16Updated last year
lucidrains / cross-transformers-pytorch
Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch
☆51Updated 3 years ago
renqianluo / SemiNAS
☆24Updated 4 years ago
xiaomi-automl / MixPath
MixPath: A Unified Approach for One-shot Neural Architecture Search
☆28Updated 4 years ago
Xianchao-Wu / perceiver-pytorch
☆42Updated 3 years ago
lancopku / AdaNorm
Code for "Understanding and Improving Layer Normalization"
☆46Updated 5 years ago
MAC-AutoML / YOCO-BERT
The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Natu…
☆48Updated 3 years ago
DeadAt0m / adafactor-pytorch
A pytorch realization of adafactor (https://arxiv.org/pdf/1804.04235.pdf )
☆23Updated 5 years ago
maple-research-lab / CaCo
CaCo: Both Positive and Negative Samples are Directly Learnable via Cooperative-adversarial Contrastive Learning
☆19Updated 11 months ago
TencentARC / DTN
Official code for "Dynamic Token Normalization Improves Vision Transformer", ICLR 2022.
☆28Updated 2 years ago
shaabhishek / gumbel-softmax-pytorch
categorical variational autoencoder using the Gumbel-Softmax estimator
☆25Updated 5 years ago
MetaLearners / Solution-to-CVPR2021-NAS-competition-Track-1
☆13Updated 3 years ago
szq0214 / S2-BNN
S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural Networks via Guided Distribution Calibration (CVPR 2021)
☆64Updated 3 years ago
cheneydon / efficient-bert
This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron …
☆32Updated last year
ziplab / EcoFormer
[NeurIPS 2022 Spotlight] This is the official PyTorch implementation of "EcoFormer: Energy-Saving Attention with Linear Complexity"
☆70Updated 2 years ago
czhang0528 / MosaicOS
[ICCV 2021] MosaicOS: A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection
☆29Updated 3 years ago