[ICML 2021 Oral] We show pure attention suffers rank collapse, and how different mechanisms combat it.
☆172Mar 8, 2021Updated 5 years ago
Alternatives and similar repositories for attention-rank-collapse
Users that are interested in attention-rank-collapse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts☆16Feb 26, 2024Updated 2 years ago
- ☆12Sep 26, 2019Updated 6 years ago
- We investigated corruption robustness across different architectures including Convolutional Neural Networks, Vision Transformers, and th…☆16Oct 28, 2021Updated 4 years ago
- Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms☆20Nov 29, 2021Updated 4 years ago
- Code for the TCS paper "On the performance of learned data structures" and the ICML paper "Why are learned indexes so effective?"☆21May 9, 2021Updated 5 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Code Release for the 2023 NeurIPS Paper How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained langua…☆17Dec 6, 2024Updated last year
- ☆391Oct 18, 2023Updated 2 years ago
- [NeurIPS'20] Code for the Paper Compositional Visual Generation and Inference with Energy Based Models☆47Mar 24, 2023Updated 3 years ago
- A study of performance of optimal transport.☆10Jul 4, 2020Updated 5 years ago
- Implementation of the GBST block from the Charformer paper, in Pytorch☆118Jul 15, 2021Updated 4 years ago
- A simple Transformer where the softmax has been replaced with normalization☆20Sep 11, 2020Updated 5 years ago
- Variational Walkback, NIPS'17☆28Oct 18, 2017Updated 8 years ago