Separius / awesome-fast-attention
list of efficient attention modules
☆999Updated 3 years ago
Alternatives and similar repositories for awesome-fast-attention:
Users that are interested in awesome-fast-attention are comparing it to the libraries listed below
- Source code for "On the Relationship between Self-Attention and Convolutional Layers"☆1,099Updated 2 years ago
- Some tricks of pytorch...☆1,187Updated 10 months ago
- Debug PyTorch code using PySnooper☆799Updated 3 years ago
- DeLighT: Very Deep and Light-Weight Transformers☆467Updated 4 years ago
- some tircks for PyTorch☆577Updated 5 years ago
- Gradually-Warmup Learning Rate Scheduler for PyTorch☆988Updated 6 months ago
- A PyTorch Implementation of Focal Loss.☆983Updated 5 years ago
- ☆876Updated 11 months ago
- My take on a practical implementation of Linformer for Pytorch.☆413Updated 2 years ago
- PyTorch implementation of Contrastive Learning methods☆1,974Updated last year
- label-smooth, amsoftmax, partial-fc, focal-loss, triplet-loss, lovasz-softmax. Maybe useful☆2,233Updated 6 months ago
- Pytorch library for fast transformer implementations☆1,697Updated 2 years ago
- Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase☆1,199Updated last year
- [ICLR 2020] Lite Transformer with Long-Short Range Attention☆607Updated 9 months ago
- Implementation of LambdaNetworks, a new approach to image recognition that reaches SOTA with less compute☆1,531Updated 4 years ago
- A quickstart and benchmark for pytorch distributed training.☆1,662Updated 8 months ago
- Awesome Knowledge-Distillation. 分类整理的知识蒸馏paper(2014-2021)。☆2,573Updated last year
- ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet☆1,184Updated last year
- A Pytorch Implementation of "Attention is All You Need" and "Weighted Transformer Network for Machine Translation"☆556Updated 4 years ago
- Implementing Attention Augmented Convolutional Networks using Pytorch☆650Updated 3 years ago
- Learning Rate Warmup in PyTorch☆409Updated last month
- My best practice of training large dataset using PyTorch.☆1,097Updated 11 months ago
- A CV toolkit for my papers.☆2,048Updated 4 months ago
- Pytorch implementation of the paper "Class-Balanced Loss Based on Effective Number of Samples"☆796Updated last year
- Official PyTorch Repo for "ReZero is All You Need: Fast Convergence at Large Depth"☆407Updated 8 months ago
- [arXiv 2019] "Contrastive Multiview Coding", also contains implementations for MoCo and InstDis☆1,322Updated 4 years ago
- A comprehensive list of awesome contrastive self-supervised learning papers.☆1,267Updated 7 months ago
- An implementation of Performer, a linear attention-based transformer, in Pytorch☆1,122Updated 3 years ago
- lookahead optimizer (Lookahead Optimizer: k steps forward, 1 step back) for pytorch☆335Updated 5 years ago
- Sublinear memory optimization for deep learning. https://arxiv.org/abs/1604.06174☆598Updated 5 years ago