[ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845
☆120Jun 20, 2021Updated 4 years ago
Alternatives and similar repositories for powernorm
Users that are interested in powernorm are comparing it to the libraries listed below
Sorting:
- Implementation for NATv2.☆23Feb 20, 2021Updated 5 years ago
- ☆19Oct 10, 2020Updated 5 years ago
- MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems☆68Oct 26, 2021Updated 4 years ago
- Using PubMed to find out how a gene contributes to addiction.☆20Dec 27, 2022Updated 3 years ago
- Codes for "Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View"☆147Jun 10, 2019Updated 6 years ago
- (ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.☆21Jul 13, 2022Updated 3 years ago
- Large Scale BERT Distillation☆33Mar 24, 2023Updated 2 years ago
- DeLighT: Very Deep and Light-Weight Transformers☆469Oct 16, 2020Updated 5 years ago
- Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"☆127Apr 5, 2021Updated 4 years ago
- code and data for paper "One-shot Text Field Labeling using Attention and BeliefPropagation for Structure Information Extraction"☆61Aug 9, 2020Updated 5 years ago
- Making BERT stretchy. Semantic Elasticsearch with Sentence Transformers☆161Sep 25, 2020Updated 5 years ago
- SoT: Delving Deeper into Classification Head for Transformer☆50Dec 24, 2021Updated 4 years ago
- ☆13Aug 4, 2021Updated 4 years ago
- Transformer training code for sequential tasks☆609Sep 14, 2021Updated 4 years ago
- Repository for group 17 on the Statistical Natural Language Processing module at UCL☆23Aug 23, 2021Updated 4 years ago
- Experiments with the ideas presented in https://arxiv.org/abs/2003.00152 by Frankle et al.☆29Aug 21, 2020Updated 5 years ago
- Code for "Are labels necessary for neural architecture search"☆92Mar 20, 2024Updated last year
- [CVPR 2020] When NAS Meets Robustness: In Search of Robust Architectures against Adversarial Attacks☆125Oct 21, 2020Updated 5 years ago
- Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"☆89Feb 1, 2021Updated 5 years ago
- (ICCV 2021) BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search☆142Dec 6, 2021Updated 4 years ago
- Low-code pre-built pipelines for experiments with huggingface/transformers for Data Scientists in a rush.☆16Oct 14, 2020Updated 5 years ago
- The implementation of our paper: Towards Robust Vision Transformer (CVPR2022)☆142Aug 16, 2022Updated 3 years ago
- Awesome Neural Adaptation in Natural Language Processing. A curated list. https://arxiv.org/abs/2006.00632☆265Jul 9, 2021Updated 4 years ago
- ☆16May 6, 2021Updated 4 years ago
- Jax implementation of the AdaHessian optimizer☆20Mar 11, 2021Updated 4 years ago
- An adaptive training algorithm for residual network☆17Aug 22, 2020Updated 5 years ago
- Code for "Understanding and Improving Layer Normalization"☆46Dec 8, 2019Updated 6 years ago
- Reducing Channel Redundancy in Convolutional Neural Networks by Features Recombining (TIP 2021)☆20Mar 1, 2023Updated 3 years ago
- ☆16Jan 7, 2021Updated 5 years ago
- ☆74Dec 8, 2022Updated 3 years ago
- Displaced Aggregation Units for Convolutional Networks from "Spatially-Adaptive Filter Units for Deep Neural Networks" paper☆21Jun 27, 2024Updated last year
- ESRGAN E2E TFLite Tutorial☆18Aug 3, 2020Updated 5 years ago
- The project is about predicting sets (of classes) from images.☆23Aug 31, 2021Updated 4 years ago
- ☆17Sep 22, 2020Updated 5 years ago
- Repository for the paper "Optimal Subarchitecture Extraction for BERT"☆470Jun 22, 2022Updated 3 years ago
- SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in c…☆359Feb 22, 2022Updated 4 years ago
- The official implementation of "The Shapley Value of Classifiers in Ensemble Games" (CIKM 2021).☆224Jan 1, 2026Updated 2 months ago
- ☆19Jan 27, 2021Updated 5 years ago
- Source code for "Efficient Training of BERT by Progressively Stacking"☆113Jul 3, 2019Updated 6 years ago