[ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845
☆120Jun 20, 2021Updated 4 years ago
Alternatives and similar repositories for powernorm
Users that are interested in powernorm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Codes for "Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View"☆147Jun 10, 2019Updated 6 years ago
- Implementation for NATv2.☆23Feb 20, 2021Updated 5 years ago
- ☆21Dec 30, 2022Updated 3 years ago
- Jax implementation of the AdaHessian optimizer☆20Mar 11, 2021Updated 5 years ago
- DeLighT: Very Deep and Light-Weight Transformers☆469Oct 16, 2020Updated 5 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"☆127Apr 5, 2021Updated 5 years ago
- Code for "Are labels necessary for neural architecture search"☆92Mar 20, 2024Updated 2 years ago
- Transformer training code for sequential tasks☆609Sep 14, 2021Updated 4 years ago
- Implementation of knapsack pruning☆27May 26, 2020Updated 5 years ago
- The implementation of "Shallow-to-Deep Training for Neural Machine Translation"☆10Oct 26, 2020Updated 5 years ago
- MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems☆68Oct 26, 2021Updated 4 years ago
- ☆13Aug 4, 2021Updated 4 years ago
- Using PubMed to find out how a gene contributes to addiction.☆20Dec 27, 2022Updated 3 years ago
- Making BERT stretchy. Semantic Elasticsearch with Sentence Transformers☆161Sep 25, 2020Updated 5 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- SoT: Delving Deeper into Classification Head for Transformer☆50Dec 24, 2021Updated 4 years ago
- Repo for the EACL2017 tutorial on imitation learning☆28Apr 3, 2017Updated 9 years ago
- Official Code for "Learning to Reason via Mixture-of-Thought for Logical Reasoning"☆28Nov 20, 2025Updated 4 months ago
- code and data for paper "One-shot Text Field Labeling using Attention and BeliefPropagation for Structure Information Extraction"☆61Aug 9, 2020Updated 5 years ago
- Code for "Understanding and Improving Layer Normalization"☆46Dec 8, 2019Updated 6 years ago
- Reducing Channel Redundancy in Convolutional Neural Networks by Features Recombining (TIP 2021)☆20Mar 1, 2023Updated 3 years ago
- Source code for "Efficient Training of BERT by Progressively Stacking"☆112Jul 3, 2019Updated 6 years ago
- The implementation of our paper: Towards Robust Vision Transformer (CVPR2022)☆142Aug 16, 2022Updated 3 years ago
- [CVPR 2020] When NAS Meets Robustness: In Search of Robust Architectures against Adversarial Attacks☆125Oct 21, 2020Updated 5 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Repository for group 17 on the Statistical Natural Language Processing module at UCL☆23Aug 23, 2021Updated 4 years ago
- (ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.☆21Jul 13, 2022Updated 3 years ago
- A fast and efficient way to compute a differentiable bound on the singular values of convolution layers☆12Nov 22, 2019Updated 6 years ago
- Histopathologic Cancer Detection model based on Kaggle Challenge https://www.kaggle.com/c/histopathologic-cancer-detection (top 1%)☆11Feb 16, 2021Updated 5 years ago
- [NeurIPS'22] Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork. Haotao Wang, Junyuan Hong,…☆15Nov 27, 2023Updated 2 years ago
- Codes for NAACL 2021 paper 'Noisy Self-Knowledge Distillation for Text Summarization'☆24Jul 27, 2021Updated 4 years ago
- Code for the paper "Adaptive Transformers for Learning Multimodal Representations" (ACL SRW 2020)☆43Oct 20, 2022Updated 3 years ago
- ☆74Dec 8, 2022Updated 3 years ago
- ☆19Jan 27, 2021Updated 5 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆17Sep 22, 2020Updated 5 years ago
- Official PyTorch Repo for "ReZero is All You Need: Fast Convergence at Large Depth"☆416Jul 25, 2024Updated last year
- Analyze AdaHessian optimizer on 2D functions.☆13Aug 13, 2021Updated 4 years ago
- ☆11Jan 30, 2023Updated 3 years ago
- Low-variance and unbiased gradient for backpropagation through categorical random variables, with application in variational auto-encoder…☆17Jul 1, 2020Updated 5 years ago
- Pytorch library for fast transformer implementations☆1,767Mar 23, 2023Updated 3 years ago
- ☆246Jul 23, 2021Updated 4 years ago