lukemelas / do-you-even-need-attention
Is the attention layer even necessary? (https://arxiv.org/abs/2105.02723)
β484Updated 3 years ago
Alternatives and similar repositories for do-you-even-need-attention:
Users that are interested in do-you-even-need-attention are comparing it to the libraries listed below
- Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).β225Updated 3 years ago
- NFNets and Adaptive Gradient Clipping for SGD implemented in PyTorch. Find explanation at tourdeml.github.io/blog/β345Updated last year
- Implementation of ConvMixer for "Patches Are All You Need? π€·"β1,070Updated 2 years ago
- A LARS implementation in PyTorchβ344Updated 5 years ago
- Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attentionβ262Updated 3 years ago
- Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorchβ425Updated 3 years ago
- Code for the Convolutional Vision Transformer (ConViT)β466Updated 3 years ago
- Fully featured implementation of Routing Transformerβ292Updated 3 years ago
- β376Updated last year
- EsViT: Efficient self-supervised Vision Transformersβ410Updated last year
- Pre-trained NFNets with 99% of the accuracy of the official paper "High-Performance Large-Scale Image Recognition Without Normalization".β159Updated 4 years ago
- Useful PyTorch functions and modules that are not implemented in PyTorch by defaultβ187Updated 11 months ago
- Unofficial PyTorch implementation of Attention Free Transformer (AFT) layers by Apple Inc.β235Updated 3 years ago
- Seamless analysis of your PyTorch models (RAM usage, FLOPs, MACs, receptive field, etc.)β218Updated last month
- Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorcβ¦β304Updated 3 years ago
- Unofficial implementation of MLP-Mixer: An all-MLP Architecture for Visionβ218Updated 3 years ago
- Collection of the latest, greatest, deep learning optimizers (for Pytorch) - CNN, NLP suitableβ214Updated 4 years ago
- DeLighT: Very Deep and Light-Weight Transformersβ467Updated 4 years ago
- Implementation of ResMLP, an all MLP solution to image classification, in Pytorchβ197Updated 2 years ago
- Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transformsβ259Updated 3 years ago
- Code to reproduce the results in the FAIR research papers "Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting Vβ¦β487Updated last year
- Masked Siamese Networks for Label-Efficient Learning (https://arxiv.org/abs/2204.07141)β457Updated 2 years ago
- β245Updated 3 years ago
- Implementation of the π Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbonesβ197Updated 4 years ago
- [ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845β119Updated 3 years ago
- Ranger deep learning optimizer rewrite to use newest componentsβ329Updated last year
- Implementation of https://arxiv.org/abs/1904.00962β374Updated 4 years ago
- Code for Noisy Student Training. https://arxiv.org/abs/1911.04252β760Updated 4 years ago
- Nested Hierarchical Transformer https://arxiv.org/pdf/2105.12723.pdfβ196Updated 8 months ago
- Compute CNN receptive field size in pytorch in one lineβ359Updated 11 months ago