[ICML 2021 Oral] We show pure attention suffers rank collapse, and how different mechanisms combat it.
☆172Mar 8, 2021Updated 5 years ago
Alternatives and similar repositories for attention-rank-collapse
Users that are interested in attention-rank-collapse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts☆16Feb 26, 2024Updated 2 years ago
- [NeurIPS 2020] Simple and practical private mean and covariance estimation.☆35Oct 4, 2020Updated 5 years ago
- ☆12Sep 26, 2019Updated 6 years ago
- We investigated corruption robustness across different architectures including Convolutional Neural Networks, Vision Transformers, and th…☆16Oct 28, 2021Updated 4 years ago
- ☆13Feb 16, 2021Updated 5 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Code for the TCS paper "On the performance of learned data structures" and the ICML paper "Why are learned indexes so effective?"☆21May 9, 2021Updated 5 years ago
- Code Release for the 2023 NeurIPS Paper How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained langua…☆17Dec 6, 2024Updated last year
- Efficient Householder Transformation in PyTorch☆69Jul 6, 2021Updated 4 years ago
- ☆391Oct 18, 2023Updated 2 years ago
- [NeurIPS'20] Code for the Paper Compositional Visual Generation and Inference with Energy Based Models☆47Mar 24, 2023Updated 3 years ago
- A study of performance of optimal transport.☆10Jul 4, 2020Updated 5 years ago
- Implementation of the GBST block from the Charformer paper, in Pytorch☆118Jul 15, 2021Updated 4 years ago
- A simple Transformer where the softmax has been replaced with normalization☆20Sep 11, 2020Updated 5 years ago
- Variational Walkback, NIPS'17☆28Oct 18, 2017Updated 8 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch☆879Oct 30, 2023Updated 2 years ago
- ☆26Nov 23, 2023Updated 2 years ago
- ☆26Feb 26, 2026Updated 2 months ago
- ☆20Feb 26, 2021Updated 5 years ago
- ☆120Feb 11, 2025Updated last year
- This repository is for the paper "A generative nonparametric Bayesian model for whole genomes"☆15Jun 7, 2023Updated 2 years ago
- Official codebase for Pretrained Transformers as Universal Computation Engines.☆246Jan 14, 2022Updated 4 years ago
- Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch☆76Dec 4, 2022Updated 3 years ago
- MNIST, but with Bezier curves instead of pixels☆15Oct 29, 2021Updated 4 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder☆210Mar 18, 2021Updated 5 years ago
- The source codes for D2AGE model. Distance-aware DAG Embedding for Proximity Search on Heterogeneous Graphs.☆12Feb 20, 2018Updated 8 years ago
- ☆32Oct 13, 2021Updated 4 years ago
- Implementation of Fast Transformer in Pytorch☆176Aug 26, 2021Updated 4 years ago
- [CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator☆1,314Jul 16, 2021Updated 4 years ago
- End-to-end training of sparse deep neural networks with little-to-no performance loss.☆336Jan 26, 2023Updated 3 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- ☆46Apr 13, 2022Updated 4 years ago
- Pretraining summarization models using a corpus of nonsense☆13Sep 28, 2021Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Disentangled Non-Local Neural Networks☆84Dec 7, 2020Updated 5 years ago
- ☆20Mar 22, 2024Updated 2 years ago
- Transformer based on a variant of attention that is linear complexity in respect to sequence length☆832May 5, 2024Updated 2 years ago
- Implementation of Lie Transformer, Equivariant Self-Attention, in Pytorch☆97Feb 19, 2021Updated 5 years ago
- Source codes for "Preference-grounded Token-level Guidance for Language Model Fine-tuning" (NeurIPS 2023).☆17Jan 8, 2025Updated last year
- [NeurIPS‘2021] "TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up", Yifan Jiang, Shiyu Chang, Zhangyang Wang☆1,689Nov 3, 2022Updated 3 years ago
- Official DeiT repository☆4,342Mar 15, 2024Updated 2 years ago