twistedcubic / attention-rank-collapseLinks

[ICML 2021 Oral] We show pure attention suffers rank collapse, and how different mechanisms combat it.

☆168

Alternatives and similar repositories for attention-rank-collapse

Users that are interested in attention-rank-collapse are comparing it to the libraries listed below

Sorting:

NVIDIA / transformer-ls
Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).
☆228Updated 3 years ago
google-research / head2toe
☆81Updated last year
ischlag / fast-weight-transformers
Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.
☆110Updated 4 years ago
sIncerass / powernorm
[ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845
☆120Updated 4 years ago
takashiishida / flooding
[ICML 2020] code for the flooding regularizer proposed in "Do We Need Zero Training Loss After Achieving Zero Training Error?"
☆95Updated 2 years ago
epfml / collaborative-attention
Code for Multi-Head Attention: Collaborate Instead of Concatenate
☆152Updated 2 years ago
VainF / Awesome-Contrastive-Learning
Awesome Contrastive Learning for CV & NLP
☆165Updated 4 years ago
lucidrains / sinkhorn-transformer
Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention
☆269Updated 4 years ago
OpenNLPLab / cosFormer
[ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention
☆196Updated 3 years ago
XuezheMax / apollo
Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization
☆182Updated 4 years ago
leaderj1001 / Synthesizer-Rethinking-Self-Attention-Transformer-Models
Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch
☆70Updated 5 years ago
lucidrains / long-short-transformer
Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch
☆120Updated 4 years ago
chingyaoc / DCL
NeurIPS 2020, Debiased Contrastive Learning
☆284Updated 2 years ago
kashif / ICLR2022-OpenReviewData
Crawl & visualize ICLR papers and reviews.
☆107Updated 3 years ago
ryanchankh / mcr2
Official Implementation of Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction (2020)
☆201Updated 2 years ago
KrisKorrel / sparsemax-pytorch
Implementation of Sparsemax activation in Pytorch
☆166Updated 5 years ago
AvivNavon / AuxiLearn
Official implementation of Auxiliary Learning by Implicit Differentiation [ICLR 2021]
☆86Updated last year
facebookresearch / graph2nn
code for paper "Graph Structure of Neural Networks"
☆155Updated 4 years ago
facebookresearch / luckmatters
Understanding Training Dynamics of Deep ReLU Networks
☆304Updated last month
ryanchankh / redunet_demo
☆84Updated 4 years ago
jiamings / ml-cpc
☆36Updated 5 years ago
sarthmit / Compositional-Attention
Code to reproduce the results for Compositional Attention
☆59Updated 3 years ago
Junya-Chen / FlatCLR
FlatNCE: A Novel Contrastive Representation Learning Objective
☆89Updated 4 years ago
thegregyang / LossUpAccUp
Loss and accuracy go opposite ways...right?
☆95Updated 5 years ago
xtinkt / editable
A supplementary code for Editable Neural Networks, an ICLR 2020 submission.
☆46Updated 5 years ago
YongfeiYan / Gumbel_Softmax_VAE
PyTorch implementation of a Variational Autoencoder with Gumbel-Softmax Distribution
☆212Updated 7 years ago
lucidrains / routing-transformer
Fully featured implementation of Routing Transformer
☆298Updated 4 years ago
tstandley / taskgrouping
Code for Which Tasks Should Be Learned Together in Multi-task Learning?
☆98Updated 2 years ago
ssnl / align_uniform
Open source code for paper "Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere" ICML 2…
☆459Updated 3 years ago
juntang-zhuang / GSAM
PyTorch repository for ICLR 2022 paper (GSAM) which improves generalization (e.g. +3.8% top-1 accuracy on ImageNet with ViT-B/32)
☆144Updated 3 years ago