erikwijmans / skynet-ddp-slurm-example
Example of using PyTorch DistributedDataParallel and SLURM on skynet
☆29Updated 3 years ago
Related projects: ⓘ
- ☆164Updated 5 years ago
- ☆61Updated 4 years ago
- Code for SelfAugment☆27Updated 3 years ago
- [ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845☆119Updated 3 years ago
- Utilities for Pytorch☆89Updated last year
- Unsupervised Data Augmentation experiments in PyTorch☆59Updated 5 years ago
- Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.☆97Updated 3 years ago
- Parameter Efficient Transfer Learning with Diff Pruning☆70Updated 3 years ago
- ☆63Updated 3 years ago
- MTAdam: Automatic Balancing of Multiple Training Loss Terms☆34Updated 3 years ago
- Bootstrap Your Own Latent (BYOL) pytorch implementation using DistributedDataParallel.☆28Updated last year
- ☆42Updated 5 years ago
- Implementation of OmniNet, Omnidirectional Representations from Transformers, in Pytorch☆53Updated 3 years ago
- Pytorch implementation of the hamburger module from the ICLR 2021 paper "Is Attention Better Than Matrix Decomposition"☆98Updated 3 years ago
- ☆46Updated 3 years ago
- Improving generalization by controlling label-noise information in neural network weights.☆39Updated 3 years ago
- Loss and accuracy go opposite ways...right?☆90Updated 4 years ago
- Implementation of Memformer, a Memory-augmented Transformer, in Pytorch☆106Updated 3 years ago
- Analyzing basic network responses to novel classes☆37Updated 2 years ago
- Implementation of Online Label Smoothing in PyTorch☆94Updated last year
- Evaluating AlexNet features at various depths☆38Updated 3 years ago
- Code for paper "Continual and Multi-Task Architecture Search (ACL 2019)"☆41Updated 5 years ago
- Code for reproducing experiments in "How Useful is Self-Supervised Pretraining for Visual Tasks?"☆60Updated last month
- source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT☆72Updated last year
- Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch☆116Updated 3 years ago
- Code for "Are labels necessary for neural architecture search"☆93Updated 6 months ago
- The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Natu…☆48Updated 3 years ago
- Big-Interleaved-Dataset☆57Updated last year
- [NeurIPS'20] GradAug: A New Regularization Method for Deep Neural Networks☆93Updated 3 years ago
- An implementation of shampoo☆73Updated 6 years ago