erikwijmans / skynet-ddp-slurm-example
Example of using PyTorch DistributedDataParallel and SLURM on skynet
☆30Updated 3 years ago
Alternatives and similar repositories for skynet-ddp-slurm-example:
Users that are interested in skynet-ddp-slurm-example are comparing it to the libraries listed below
- ☆62Updated 4 years ago
- [ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845☆119Updated 3 years ago
- Code for SelfAugment☆27Updated 4 years ago
- Pre-trained V+L Data Preparation☆46Updated 4 years ago
- ☆34Updated 6 years ago
- Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch☆118Updated 3 years ago
- An implementation of shampoo☆74Updated 7 years ago
- ☆165Updated 6 years ago
- Big-Interleaved-Dataset☆58Updated 2 years ago
- source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT☆72Updated 2 years ago
- Implementation of Memformer, a Memory-augmented Transformer, in Pytorch☆115Updated 4 years ago
- Code for reproducing experiments in "How Useful is Self-Supervised Pretraining for Visual Tasks?"☆60Updated 9 months ago
- An unofficial PyTorch implementation of the HAN and AdaHAN models presented in the "Learning Visual Question Answering by Bootstrapping H…☆54Updated 6 years ago
- MTAdam: Automatic Balancing of Multiple Training Loss Terms☆36Updated 4 years ago
- ☆16Updated 3 years ago
- Unsupervised Data Augmentation experiments in PyTorch☆59Updated 5 years ago
- A minimal pytorch package implementing a gradient reversal layer.☆158Updated 5 months ago
- Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER…☆119Updated 4 years ago
- A simple but well-performing "single-hop" visual attention model for the GQA dataset☆20Updated 5 years ago
- CLASP - Contrastive Language-Aminoacid Sequence Pretraining☆142Updated 3 years ago
- The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Natu…☆48Updated 3 years ago
- ☆53Updated 3 years ago
- Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"☆88Updated 4 years ago
- ☆76Updated 2 years ago
- Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.☆104Updated 3 years ago
- Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation pr…☆45Updated 5 years ago
- GPT, but made only out of MLPs☆88Updated 3 years ago
- PyTorch CTC Decoder bindings☆42Updated 7 years ago
- Parameter Efficient Transfer Learning with Diff Pruning☆73Updated 4 years ago
- ☆73Updated 2 years ago