☆93Feb 13, 2024Updated 2 years ago
Alternatives and similar repositories for opus-100-corpus
Users that are interested in opus-100-corpus are comparing it to the libraries listed below
Sorting:
- Zero -- A neural machine translation system☆153May 8, 2023Updated 2 years ago
- A tool that locates, downloads, and extracts machine translation corpora☆162Sep 18, 2025Updated 5 months ago
- This repository contains additional reference translations for the WMT'14 En-De (newstest2014) and WMT'19 En-Ru (newstest2019) test sets …☆15Aug 31, 2021Updated 4 years ago
- Code and Data release for "Improving Multilingual Translation by Representation and Gradient Regularization" (Yang et al. EMNLP 2021), an…☆13Aug 12, 2024Updated last year
- A parallel evaluation data set of SAP software documentation with document structure annotation☆14Jul 30, 2025Updated 7 months ago
- Implementation of "Modeling Past and Future for Neural Machine Translation"☆15Mar 16, 2018Updated 7 years ago
- Cross-lingual GLUE☆49Jun 15, 2023Updated 2 years ago
- A High-Quality Multilingual Dataset for Structured Documentation Translation☆37May 1, 2025Updated 10 months ago
- Automatically harvested multilingual contrastive word sense disambiguation test sets for machine translation☆17Jan 18, 2021Updated 5 years ago
- Framework for neural-based Quality Estimation☆41Sep 23, 2020Updated 5 years ago
- PhD thesis (updating) of Jiatao Gu from HKU☆19Aug 10, 2018Updated 7 years ago
- This repo supports various cross-lingual transfer learning & multilingual NLP models.☆92Sep 13, 2023Updated 2 years ago
- On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation (Findings of EMNLP 2021))☆13Nov 21, 2021Updated 4 years ago
- Decoding platform for machine translation research☆54Aug 24, 2019Updated 6 years ago
- Code for EMNLP2021 paper "Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training"☆20Nov 12, 2021Updated 4 years ago
- 🪱 PARASITE || A parallel sentence data preprocessing toolkit. Originally developed as a part of the `en-ru` winner submission of WMT20 B…☆11Jun 8, 2021Updated 4 years ago
- A masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a…☆246Sep 17, 2021Updated 4 years ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆160Jun 18, 2024Updated last year
- Facebook Low Resource (FLoRes) MT Benchmark☆766Nov 20, 2023Updated 2 years ago
- A neural word aligner based on multilingual BERT☆373Mar 10, 2022Updated 3 years ago
- Neural macine translation soft alignment visualisations for web and command line☆72Aug 19, 2021Updated 4 years ago
- Pytorch implementation of Multimodal Neural Machine Translation(MNMT).☆12Jan 21, 2021Updated 5 years ago
- ☆15Nov 5, 2020Updated 5 years ago
- eXtensible Neural Machine Translation☆186Sep 22, 2025Updated 5 months ago
- MT Evaluation in Many Languages via Zero-Shot Paraphrasing☆102Jul 25, 2024Updated last year
- Generative Flow based Sequence-to-Sequence Toolkit written in Python.☆247Jan 28, 2020Updated 6 years ago
- LASER multilingual sentence embeddings as a pip package☆224Aug 11, 2023Updated 2 years ago
- Open-Source Neural Machine Translation in Tensorflow☆802Dec 9, 2022Updated 3 years ago
- ☆12Dec 9, 2015Updated 10 years ago
- Data and scripts for the proper evaluation of cross-lingual embeddings in multiple languages☆15Apr 11, 2020Updated 5 years ago
- ☆86Dec 26, 2022Updated 3 years ago
- This repo is not maintained. For latest version, please visit https://github.com/ictnlp. A collection of transformer's guides, implementa…☆44Dec 5, 2018Updated 7 years ago
- Efficient Low-Memory Aligner☆146Jan 15, 2025Updated last year
- ☆17Apr 28, 2022Updated 3 years ago
- ☆15May 26, 2021Updated 4 years ago
- ☆13Dec 11, 2020Updated 5 years ago
- ☆846Aug 20, 2024Updated last year
- XTREME is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models that covers 40 ty…☆651Jan 4, 2023Updated 3 years ago
- Python port of Moses tokenizer, truecaser and normalizer☆495Feb 6, 2026Updated 3 weeks ago