Separius / awesome-fast-attention
list of efficient attention modules
☆995Updated 3 years ago
Alternatives and similar repositories for awesome-fast-attention:
Users that are interested in awesome-fast-attention are comparing it to the libraries listed below
- Source code for "On the Relationship between Self-Attention and Convolutional Layers"☆1,096Updated 2 years ago
- Some tricks of pytorch...☆1,178Updated 8 months ago
- A comprehensive list of awesome contrastive self-supervised learning papers.☆1,248Updated 5 months ago
- PyTorch implementation of Contrastive Learning methods☆1,960Updated last year
- Gradually-Warmup Learning Rate Scheduler for PyTorch☆986Updated 4 months ago
- Debug PyTorch code using PySnooper☆800Updated 3 years ago
- My best practice of training large dataset using PyTorch.☆1,092Updated 9 months ago
- [ICLR 2020] Lite Transformer with Long-Short Range Attention☆606Updated 7 months ago
- Awesome Knowledge-Distillation. 分类整理的知识蒸馏paper(2014-2021)。☆2,545Updated last year
- A quickstart and benchmark for pytorch distributed training.☆1,651Updated 6 months ago
- Implementation of LambdaNetworks, a new approach to image recognition that reaches SOTA with less compute☆1,534Updated 4 years ago
- An All-MLP solution for Vision, from Google AI☆1,013Updated 5 months ago
- 😎 An up-to-date & curated list of awesome semi-supervised learning papers, methods & resources.☆1,814Updated 8 months ago
- some tircks for PyTorch☆578Updated 5 years ago
- [arXiv 2019] "Contrastive Multiview Coding", also contains implementations for MoCo and InstDis☆1,314Updated 4 years ago
- label-smooth, amsoftmax, partial-fc, focal-loss, triplet-loss, lovasz-softmax. Maybe useful☆2,211Updated 4 months ago
- Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase☆1,196Updated last year
- DeLighT: Very Deep and Light-Weight Transformers☆467Updated 4 years ago
- Collection for Few-shot Learning☆973Updated last year
- This repository contains code for the paper "Decoupling Representation and Classifier for Long-Tailed Recognition", published at ICLR 202…☆958Updated 3 years ago
- Unsupervised Data Augmentation (UDA)☆2,187Updated 3 years ago
- Pytorch library for fast transformer implementations☆1,677Updated last year
- pytorch memory track code☆1,015Updated 3 years ago
- Learning Rate Warmup in PyTorch☆403Updated 2 weeks ago
- knowledge distillation papers☆748Updated 2 years ago
- ☆872Updated 8 months ago
- Reformer, the efficient Transformer, in Pytorch☆2,152Updated last year
- A multi-task learning example for the paper https://arxiv.org/abs/1705.07115☆852Updated 4 years ago
- A PyTorch Implementation of Focal Loss.☆974Updated 5 years ago
- My take on a practical implementation of Linformer for Pytorch.☆411Updated 2 years ago