Scripts to preprocess training and test data and to run fast_align and giza
☆107Nov 2, 2021Updated 4 years ago
Alternatives and similar repositories for alignment-scripts
Users that are interested in alignment-scripts are comparing it to the libraries listed below
Sorting:
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆160Jun 18, 2024Updated last year
- Efficient Low-Memory Aligner☆146Jan 15, 2025Updated last year
- Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021☆61May 10, 2021Updated 4 years ago
- A neural word aligner based on multilingual BERT☆373Mar 10, 2022Updated 3 years ago
- Simple, fast unsupervised word aligner☆767Jul 19, 2022Updated 3 years ago
- Neural macine translation soft alignment visualisations for web and command line☆72Aug 19, 2021Updated 4 years ago
- OpusFilter - Parallel corpus processing toolkit☆115Feb 11, 2026Updated 2 weeks ago
- We release a dataset based on Wikipedia sentences and the corresponding translations in 6 different languages along with the scores (scal…☆81Aug 31, 2021Updated 4 years ago
- Sampling-Based Minimum Bayes-Risk Decoding for Neural Machine Translation☆16Oct 14, 2022Updated 3 years ago
- A word alignment tool based on famous GIZA++, extended to support multi-threading, resume training and incremental training.☆166May 12, 2021Updated 4 years ago
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆389Nov 7, 2023Updated 2 years ago
- ☆23Nov 15, 2022Updated 3 years ago
- Code for AAAI 2021 paper "Lexically Constrained Neural Machine Translation with Explicit Alignment Guidance"☆25Dec 14, 2022Updated 3 years ago
- Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.☆126Oct 13, 2025Updated 4 months ago
- A High-Quality Multilingual Dataset for Structured Documentation Translation☆37May 1, 2025Updated 9 months ago
- Open-Source Machine Translation Quality Estimation in PyTorch☆232Jun 23, 2022Updated 3 years ago
- Japanese--Russian--English News Commentary Parallel Data☆18Jul 9, 2019Updated 6 years ago
- Word sense disambiguation test sets for NMT☆20Dec 3, 2020Updated 5 years ago
- Lexically Constrained Neural Machine Translation with Levenshtein Transformer☆40Jul 14, 2020Updated 5 years ago
- Python package to augment multilingual data☆15Feb 15, 2023Updated 3 years ago
- This repository contains additional reference translations for the WMT'14 En-De (newstest2014) and WMT'19 En-Ru (newstest2019) test sets …☆15Aug 31, 2021Updated 4 years ago
- Python port of Moses tokenizer, truecaser and normalizer☆495Feb 6, 2026Updated 3 weeks ago
- A tool for holistic analysis of language generations systems☆471Sep 22, 2025Updated 5 months ago
- ☆42Jul 17, 2018Updated 7 years ago
- ☆20Aug 17, 2021Updated 4 years ago
- Post-editing Datasets by Rakuten (PEDRa)☆14Jun 23, 2021Updated 4 years ago
- Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.☆45Nov 2, 2022Updated 3 years ago
- Efficient Markov Chain word alignment☆53Aug 1, 2021Updated 4 years ago
- Scripts and noise data for Belinkov & Bisk 2018☆29Apr 27, 2018Updated 7 years ago
- A library for data streaming and augmentation☆21May 5, 2025Updated 9 months ago
- ☆28Oct 6, 2020Updated 5 years ago
- ☆36Aug 25, 2022Updated 3 years ago
- Improved Sentence Alignment in Linear Time and Space☆192Mar 6, 2023Updated 2 years ago
- NJUNMT for docNMT☆16Sep 9, 2020Updated 5 years ago
- ☆13Aug 23, 2024Updated last year
- Framework for neural-based Quality Estimation☆41Sep 23, 2020Updated 5 years ago
- eXtensible Neural Machine Translation☆186Sep 22, 2025Updated 5 months ago
- ☆25Oct 22, 2022Updated 3 years ago
- ☆38Jun 3, 2021Updated 4 years ago