twistedcubic / attention-rank-collapseView external linksLinks
[ICML 2021 Oral] We show pure attention suffers rank collapse, and how different mechanisms combat it.
☆171Mar 8, 2021Updated 4 years ago
Alternatives and similar repositories for attention-rank-collapse
Users that are interested in attention-rank-collapse are comparing it to the libraries listed below
Sorting:
- ☆12Sep 26, 2019Updated 6 years ago
- ☆13Feb 16, 2021Updated 5 years ago
- ☆26Nov 23, 2023Updated 2 years ago
- We investigated corruption robustness across different architectures including Convolutional Neural Networks, Vision Transformers, and th…☆16Oct 28, 2021Updated 4 years ago
- ☆388Oct 18, 2023Updated 2 years ago
- Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms☆20Nov 29, 2021Updated 4 years ago
- [NeurIPS'20] Code for the Paper Compositional Visual Generation and Inference with Energy Based Models☆47Mar 24, 2023Updated 2 years ago
- ☆20Feb 26, 2021Updated 4 years ago
- ☆118Feb 11, 2025Updated last year
- Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch☆879Oct 30, 2023Updated 2 years ago
- RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder☆210Mar 18, 2021Updated 4 years ago
- ☆27Jul 28, 2025Updated 6 months ago
- End-to-end training of sparse deep neural networks with little-to-no performance loss.☆335Jan 26, 2023Updated 3 years ago
- [CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator☆1,317Jul 16, 2021Updated 4 years ago
- ☆46Apr 13, 2022Updated 3 years ago
- Efficient Householder Transformation in PyTorch☆69Jul 6, 2021Updated 4 years ago
- Code for ICML2021 paper 'Commutative Lie Group VAE for Disentanglement Learning'.☆23Nov 2, 2022Updated 3 years ago
- Disentangled Non-Local Neural Networks☆83Dec 7, 2020Updated 5 years ago
- RE3: State Entropy Maximization with Random Encoders for Efficient Exploration☆69Jul 29, 2021Updated 4 years ago
- The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We s…☆67Dec 16, 2022Updated 3 years ago
- Variational Walkback, NIPS'17☆28Oct 18, 2017Updated 8 years ago
- Neural Unification for Logic Reasoning over Language☆22Nov 15, 2021Updated 4 years ago
- Repo for the paper "Bounding Training Data Reconstruction in Private (Deep) Learning".☆11Jun 16, 2023Updated 2 years ago
- ☆10Jan 28, 2021Updated 5 years ago
- ☆44Mar 3, 2023Updated 2 years ago
- Official repository for the paper: "Trees with Attention for Set Prediction Tasks" (ICML21)☆10Jan 19, 2022Updated 4 years ago
- ☆10May 24, 2020Updated 5 years ago
- ☆10Jun 3, 2019Updated 6 years ago
- ☆42Mar 23, 2023Updated 2 years ago
- ☆11Feb 25, 2025Updated 11 months ago
- Companion repository to "Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models"☆14May 31, 2023Updated 2 years ago
- Code for 'Contrastive Multi-Document Question Generation'☆11Oct 16, 2022Updated 3 years ago
- [WACV21] Code for our paper: Samuel, Atzmon and Chechik, "From Generalized zero-shot learning to long-tail with class descriptors"☆28Apr 6, 2021Updated 4 years ago
- Implementation of the models and datasets used in "An Information-theoretic Approach to Distribution Shifts"☆25Nov 2, 2021Updated 4 years ago
- Official repository for the "Big Transfer (BiT): General Visual Representation Learning" paper.☆1,539Jul 30, 2024Updated last year
- [NeurIPS‘2021] "TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up", Yifan Jiang, Shiyu Chang, Zhangyang Wang☆1,690Nov 3, 2022Updated 3 years ago
- This is the public github for our paper "Transformer with a Mixture of Gaussian Keys"☆28Aug 13, 2022Updated 3 years ago
- ☆62Apr 19, 2022Updated 3 years ago
- This repository is for the paper "A generative nonparametric Bayesian model for whole genomes"☆14Jun 7, 2023Updated 2 years ago