zhuohan123 / macaron-netView external linksLinks
Codes for "Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View"
☆147Jun 10, 2019Updated 6 years ago
Alternatives and similar repositories for macaron-net
Users that are interested in macaron-net are comparing it to the libraries listed below
Sorting:
- Source code for "Efficient Training of BERT by Progressively Stacking"☆113Jul 3, 2019Updated 6 years ago
- [ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845☆120Jun 20, 2021Updated 4 years ago
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.☆22Jan 25, 2023Updated 3 years ago
- ACL19_Depth_Growing_for_Neural_Machine_Translation☆23Jul 6, 2019Updated 6 years ago
- Code for the paper Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems (ACL19)☆100Oct 17, 2022Updated 3 years ago
- Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"☆127Apr 5, 2021Updated 4 years ago
- Tensorflow Source code for "Recurrently Controlled Recurrent Networks" (NIPS 2018)☆23Oct 25, 2018Updated 7 years ago
- Code for the paper "Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks"☆580Aug 28, 2019Updated 6 years ago
- code for Explicit Sparse Transformer☆61Jul 21, 2023Updated 2 years ago
- Transformer training code for sequential tasks☆610Sep 14, 2021Updated 4 years ago
- Paper List For Linking ODE and Deep Learning☆246Feb 18, 2020Updated 5 years ago
- Code for the article "Automatic Temperature Control for Neural Machine Translation" (EMNLP 2018)☆14Apr 16, 2019Updated 6 years ago
- The implementation of "Learning Deep Transformer Models for Machine Translation"☆116Jul 25, 2024Updated last year
- Some good(maybe) papers about NMT (Neural Machine Translation).☆85Jan 15, 2020Updated 6 years ago
- Experiments with Neural ODEs and Adversarial Attacks☆44Jan 13, 2019Updated 7 years ago
- Understanding the Difficulty of Training Transformers☆332May 31, 2022Updated 3 years ago
- Repository for ACL 2019 paper☆74Jun 30, 2019Updated 6 years ago
- ☆10Feb 12, 2020Updated 6 years ago
- Codes for "Towards Binary-Valued Gates for Robust LSTM Training".☆75Jul 22, 2018Updated 7 years ago
- ☆57Oct 6, 2021Updated 4 years ago
- Code, data, and additional analysis for the paper Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evalua…☆15Aug 13, 2020Updated 5 years ago
- Code for ACL2020 "Jointly Masked Sequence-to-Sequence Model for Non-Autoregressive Neural Machine Translation"☆39Jun 24, 2020Updated 5 years ago
- Pytorch library for fast transformer implementations☆1,761Mar 23, 2023Updated 2 years ago
- STABILIZING GRADIENTS FOR DEEP NEURAL NETWORKS VIA EFFICIENT SVD PARAMETERIZATION☆16Jun 5, 2018Updated 7 years ago
- Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms☆20Nov 29, 2021Updated 4 years ago
- ☆14Nov 16, 2022Updated 3 years ago
- wake-up word emotion recognition [APSIPA 2022]☆17Nov 11, 2022Updated 3 years ago
- Experiments from the paper "On Second Order Behaviour in Augmented Neural ODEs"☆61Sep 30, 2024Updated last year
- PyTorch original implementation of Cross-lingual Language Model Pretraining.☆2,924Feb 14, 2023Updated 2 years ago
- ☆20Feb 26, 2021Updated 4 years ago
- MASS: Masked Sequence to Sequence Pre-training for Language Generation☆1,123Nov 28, 2022Updated 3 years ago
- The entmax mapping and its loss, a family of sparse softmax alternatives.☆459Jun 22, 2024Updated last year
- Code for our nips19 paper: You Only Propagate Once: Accelerating Adversarial Training Via Maximal Principle☆179Jul 25, 2024Updated last year
- code for paper "Improving Sequence-to-Sequence Learning via Optimal Transport"☆68Jun 24, 2019Updated 6 years ago
- ICLR2020 Downloader & Search Tool☆18Oct 8, 2019Updated 6 years ago
- Butterfly matrix multiplication in PyTorch☆178Oct 5, 2023Updated 2 years ago
- AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning (Published in TMLR)☆23Oct 15, 2024Updated last year
- A tensorflow implementation of the NIPS 2018 paper "Variational Inference with Tail-adaptive f-Divergence"☆20Jan 11, 2019Updated 7 years ago
- Learnable Embedding Space for Efficient Neural Architecture Compression☆29Apr 25, 2019Updated 6 years ago