Separius / awesome-fast-attention
list of efficient attention modules
☆988Updated 3 years ago
Related projects: ⓘ
- PyTorch implementation of Contrastive Learning methods☆1,932Updated 11 months ago
- A comprehensive list of awesome contrastive self-supervised learning papers.☆1,205Updated last week
- Awesome Knowledge-Distillation. 分类整理的知识蒸馏paper(2014-2021)。☆2,468Updated last year
- Some tricks of pytorch...☆1,152Updated 2 months ago
- Source code for "On the Relationship between Self-Attention and Convolutional Layers"☆1,075Updated last year
- knowledge distillation papers☆736Updated last year
- Reformer, the efficient Transformer, in Pytorch☆2,097Updated last year
- A curated list of Multimodal Related Research.☆1,295Updated last year
- label-smooth, amsoftmax, partial-fc, focal-loss, triplet-loss, lovasz-softmax. Maybe useful☆2,159Updated last year
- [arXiv 2019] "Contrastive Multiview Coding", also contains implementations for MoCo and InstDis☆1,299Updated 3 years ago
- A curated list of resources for Learning with Noisy Labels☆2,621Updated 4 months ago
- some tircks for PyTorch☆579Updated 4 years ago
- Gradually-Warmup Learning Rate Scheduler for PyTorch☆971Updated 3 years ago
- A quickstart and benchmark for pytorch distributed training.☆1,617Updated last month
- [ICLR 2020] Lite Transformer with Long-Short Range Attention☆596Updated 2 months ago
- pytorch memory track code☆992Updated 3 years ago
- This repository contains code for the paper "Decoupling Representation and Classifier for Long-Tailed Recognition", published at ICLR 202…☆938Updated 2 years ago
- Pytorch library for fast transformer implementations☆1,621Updated last year
- My best practice of training large dataset using PyTorch.☆1,080Updated 4 months ago
- Pytorch implementation of various Knowledge Distillation (KD) methods.☆1,579Updated 2 years ago
- A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility☆1,834Updated last year
- A PyTorch Implementation of Focal Loss.☆951Updated 4 years ago
- mixup: Beyond Empirical Risk Minimization☆1,156Updated 2 years ago
- Debug PyTorch code using PySnooper☆800Updated 3 years ago
- An All-MLP solution for Vision, from Google AI☆987Updated this week
- Awesome Knowledge Distillation☆3,417Updated 3 weeks ago
- 😎 An up-to-date & curated list of awesome semi-supervised learning papers, methods & resources.☆1,770Updated 3 months ago
- DeLighT: Very Deep and Light-Weight Transformers☆465Updated 3 years ago
- ☆866Updated 3 months ago
- Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase☆1,185Updated 8 months ago