[ICML 2021 Oral] We show pure attention suffers rank collapse, and how different mechanisms combat it.
☆171Mar 8, 2021Updated 5 years ago
Alternatives and similar repositories for attention-rank-collapse
Users that are interested in attention-rank-collapse are comparing it to the libraries listed below
Sorting:
- ☆12Sep 26, 2019Updated 6 years ago
- ☆13Feb 16, 2021Updated 5 years ago
- ☆26Nov 23, 2023Updated 2 years ago
- We investigated corruption robustness across different architectures including Convolutional Neural Networks, Vision Transformers, and th…☆16Oct 28, 2021Updated 4 years ago
- ☆388Oct 18, 2023Updated 2 years ago
- This website is to host a series of tutorials on Deep Learning on Graphs for Natural Language Processing.☆13Sep 19, 2022Updated 3 years ago
- [NeurIPS'20] Code for the Paper Compositional Visual Generation and Inference with Energy Based Models☆47Mar 24, 2023Updated 2 years ago
- ☆20Feb 26, 2021Updated 5 years ago
- ☆118Feb 11, 2025Updated last year
- Code for the TCS paper "On the performance of learned data structures" and the ICML paper "Why are learned indexes so effective?"☆21May 9, 2021Updated 4 years ago
- RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder☆211Mar 18, 2021Updated 4 years ago
- ☆20Mar 22, 2024Updated last year
- End-to-end training of sparse deep neural networks with little-to-no performance loss.☆335Jan 26, 2023Updated 3 years ago
- [CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator☆1,317Jul 16, 2021Updated 4 years ago
- ACL 2023 Dual-Alignment Pre-training for Cross-lingual Sentence Embedding☆24Aug 21, 2024Updated last year
- ☆46Apr 13, 2022Updated 3 years ago
- Efficient Householder Transformation in PyTorch☆69Jul 6, 2021Updated 4 years ago
- Disentangled Non-Local Neural Networks☆84Dec 7, 2020Updated 5 years ago
- RE3: State Entropy Maximization with Random Encoders for Efficient Exploration☆69Jul 29, 2021Updated 4 years ago
- The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We s…☆67Dec 16, 2022Updated 3 years ago
- Variational Walkback, NIPS'17☆28Oct 18, 2017Updated 8 years ago
- Neural Unification for Logic Reasoning over Language☆22Nov 15, 2021Updated 4 years ago
- ☆10Jan 28, 2021Updated 5 years ago
- Official repository for the paper: "Trees with Attention for Set Prediction Tasks" (ICML21)☆10Jan 19, 2022Updated 4 years ago
- Repo for the paper "Bounding Training Data Reconstruction in Private (Deep) Learning".☆11Jun 16, 2023Updated 2 years ago
- ☆11Feb 25, 2025Updated last year
- ☆10May 24, 2020Updated 5 years ago
- Companion repository to "Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models"☆14May 31, 2023Updated 2 years ago
- Code for 'Contrastive Multi-Document Question Generation'☆11Oct 16, 2022Updated 3 years ago
- ☆42Mar 23, 2023Updated 2 years ago
- [WACV21] Code for our paper: Samuel, Atzmon and Chechik, "From Generalized zero-shot learning to long-tail with class descriptors"☆27Apr 6, 2021Updated 4 years ago
- Implementation of the models and datasets used in "An Information-theoretic Approach to Distribution Shifts"☆25Nov 2, 2021Updated 4 years ago
- Official repository for the "Big Transfer (BiT): General Visual Representation Learning" paper.☆1,539Jul 30, 2024Updated last year
- [NeurIPS‘2021] "TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up", Yifan Jiang, Shiyu Chang, Zhangyang Wang☆1,690Nov 3, 2022Updated 3 years ago
- ☆62Apr 19, 2022Updated 3 years ago
- Code for the Shortformer model, from the ACL 2021 paper by Ofir Press, Noah A. Smith and Mike Lewis.☆147Jul 26, 2021Updated 4 years ago
- lanmt ebm☆12Jun 19, 2020Updated 5 years ago
- This repository is for the paper "A generative nonparametric Bayesian model for whole genomes"☆14Jun 7, 2023Updated 2 years ago
- Notebooks for managing NeurIPS 2014 and analysing the NeurIPS experiment.☆13May 22, 2024Updated last year