tnq177 / transformers_without_tearsView external linksLinks
Transformers without Tears: Improving the Normalization of Self-Attention
☆134May 29, 2024Updated last year
Alternatives and similar repositories for transformers_without_tears
Users that are interested in transformers_without_tears are comparing it to the libraries listed below
Sorting:
- Neural Machine Translation system for English to Vietnamese (IWSLT'15 English-Vietnamese data)☆62Jul 22, 2019Updated 6 years ago
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆11Mar 18, 2023Updated 2 years ago
- Official code and data of "3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset"☆12Dec 8, 2024Updated last year
- Witwicky: An implementation of Transformer in PyTorch.☆22Aug 17, 2020Updated 5 years ago
- [ACL‘20] Highway Transformer: A Gated Transformer.☆33Dec 5, 2021Updated 4 years ago
- This repository contains additional reference translations for the WMT'14 En-De (newstest2014) and WMT'19 En-Ru (newstest2019) test sets …☆15Aug 31, 2021Updated 4 years ago
- A long version of BART model based on Longformer model☆24Jun 12, 2023Updated 2 years ago
- Understanding the Difficulty of Training Transformers☆332May 31, 2022Updated 3 years ago
- A masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a…☆246Sep 17, 2021Updated 4 years ago
- Fully featured implementation of Routing Transformer☆300Nov 6, 2021Updated 4 years ago
- Instruction to data diversification☆24Nov 24, 2020Updated 5 years ago
- ☆44Sep 16, 2020Updated 5 years ago
- Pytorch implementation of "A Probabilistic Formulation of Unsupervised Text Style Transfer" by He. et. al. at ICLR 2020☆162Oct 19, 2022Updated 3 years ago
- STABILIZING GRADIENTS FOR DEEP NEURAL NETWORKS VIA EFFICIENT SVD PARAMETERIZATION☆16Jun 5, 2018Updated 7 years ago
- Some good(maybe) papers about NMT (Neural Machine Translation).☆85Jan 15, 2020Updated 6 years ago
- SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in c…☆359Feb 22, 2022Updated 3 years ago
- Domain Adaptive Text Style Transfer, EMNLP 2019☆70Oct 15, 2019Updated 6 years ago
- ☆23Oct 30, 2023Updated 2 years ago
- An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.☆21Nov 28, 2022Updated 3 years ago
- ☆15Dec 5, 2019Updated 6 years ago
- An implementation of Transformer with Expire-Span, a circuit for learning which memories to retain☆34Oct 30, 2020Updated 5 years ago
- Scripts and noise data for Belinkov & Bisk 2018☆29Apr 27, 2018Updated 7 years ago
- ☆32Sep 27, 2021Updated 4 years ago
- Cascaded Text Generation with Markov Transformers☆130Mar 20, 2023Updated 2 years ago
- DisCo Transformer for Non-autoregressive MT☆77Jul 28, 2022Updated 3 years ago
- A parallel evaluation data set of SAP software documentation with document structure annotation☆14Jul 30, 2025Updated 6 months ago
- DeLighT: Very Deep and Light-Weight Transformers☆469Oct 16, 2020Updated 5 years ago
- Generative Flow based Sequence-to-Sequence Toolkit written in Python.☆247Jan 28, 2020Updated 6 years ago
- Latent Alignment and Variational Attention☆328Nov 5, 2018Updated 7 years ago
- Multi-lingual & multi-domain (specialisation for biomedical data) translation model☆40Nov 17, 2020Updated 5 years ago
- ☆63Nov 27, 2022Updated 3 years ago
- Code and Data for ACL 2020 paper "Few-Shot NLG with Pre-Trained Language Model"☆190May 23, 2025Updated 8 months ago
- Sequence-Level Mixed Sample Data Augmentation☆22Mar 7, 2021Updated 4 years ago
- PyTorch implementation of L2R2 in SIGIR 2020☆17Jun 12, 2023Updated 2 years ago
- An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)☆451Jan 28, 2026Updated 2 weeks ago
- LM pretraining for generation, reading list, resources, conference mappings.☆20Feb 25, 2020Updated 5 years ago
- NER task for Naver NLP Challenge 2018 (3rd Place)☆18Mar 24, 2023Updated 2 years ago
- Data for the ACL SRW 2020 paper "Understanding Points of Correspondence between Sentences for Abstractive Summarization"☆20Nov 2, 2022Updated 3 years ago
- Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"☆89Feb 1, 2021Updated 5 years ago