MicrosoftTranslator/NTREX

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MicrosoftTranslator/NTREX)

MicrosoftTranslator / NTREX

NTREX -- News Test References for MT Evaluation

☆88

Alternatives and similar repositories for NTREX

Users that are interested in NTREX are comparing it to the libraries listed below

Sorting:

openlanguagedata / flores
View on GitHub
The FLORES+ Machine Translation Benchmark
☆111Nov 12, 2024Updated last year
Helsinki-NLP / OpusFilter
View on GitHub
OpusFilter - Parallel corpus processing toolkit
☆115Feb 11, 2026Updated 2 weeks ago
thammegowda / mtdata
View on GitHub
A tool that locates, downloads, and extracts machine translation corpora
☆162Sep 18, 2025Updated 5 months ago
alirezamshi / small100
View on GitHub
Implementation of "SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages" paper, accepted to E…
☆28Feb 8, 2023Updated 3 years ago
langtech-bsc / mt-evaluation
View on GitHub
A framework for evaluating Machine Translation models.
☆12May 26, 2025Updated 9 months ago
google-research / mt-metrics-eval
View on GitHub
Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.
☆126Oct 13, 2025Updated 4 months ago
wmt-conference / wmt-format-tools
View on GitHub
Tools for formatting WMT hypothesis and test sets in XML
☆27Apr 18, 2025Updated 10 months ago
uds-lsv / afro-maft
View on GitHub
☆17Jan 12, 2023Updated 3 years ago
hsing-wang / Awesome-LLM-MT
View on GitHub
☆254May 30, 2024Updated last year
AppraiseDev / OCELoT
View on GitHub
Project OCELoT: an Open, Collaborative Evaluation Leaderboard of Translations
☆23Nov 5, 2025Updated 3 months ago
cisnlp / parcoure
View on GitHub
ParCourE - Parallel Corpus Explorer
☆12Dec 27, 2021Updated 4 years ago
alirezamshi-zz / small100
View on GitHub
Implementation of "SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages" paper, accepted to E…
☆25Nov 4, 2022Updated 3 years ago
bitextor / bicleaner
View on GitHub
Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.
☆160Jun 18, 2024Updated last year
facebookresearch / flores
View on GitHub
Facebook Low Resource (FLoRes) MT Benchmark
☆765Nov 20, 2023Updated 2 years ago
lt3 / nfr
View on GitHub
Neural Fuzzy Repair (NFR) is a data augmentation pipeline, which integrates fuzzy matches (i.e. similar translations) into neural machine…
☆12Aug 14, 2024Updated last year
masakhane-io / lafand-mt
View on GitHub
MAFAND-MT
☆61Jul 9, 2024Updated last year
Unbabel / COMET
View on GitHub
A Neural Framework for MT Evaluation
☆717Feb 5, 2026Updated 3 weeks ago
duyichao / MINETrans-IWSLT23
View on GitHub
Official implementation of our IWSLT 2023 paper "The MineTrans Systems for IWSLT 2023 Offline Speech Translation and Speech-to-Speech Tra…
☆16Jul 14, 2023Updated 2 years ago
gauthelo / kallaama-speech-dataset
View on GitHub
A transcribed speech dataset in Wolof, Pulaar and Sereer, to support agriculture. Funded by Lacuna Fund.
☆18Apr 29, 2024Updated last year
dayeonki / mt_feedback
View on GitHub
(NAACL 2024) Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations
☆15Apr 14, 2025Updated 10 months ago
xinjli / phonepiece
View on GitHub
phone inventory library
☆17May 15, 2023Updated 2 years ago
google-research / metricx
View on GitHub
☆133Jan 22, 2026Updated last month
neulab / AfricanVoices
View on GitHub
Hosts text-to-speech corpus and speech synthesizers for African languages.
☆18May 31, 2023Updated 2 years ago
google-research / url-nlp
View on GitHub
☆266Aug 1, 2025Updated 7 months ago
ZurichNLP / ContraDecode
View on GitHub
The implementation of "Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Deco…
☆36Aug 29, 2025Updated 6 months ago
mayhewsw / multilingual-data-stats
View on GitHub
Statistics on multilingual datasets
☆17Jul 12, 2022Updated 3 years ago
EleanorJiang / BlonDe
View on GitHub
Official implementations for (1) BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation and (2) Discourse Centric …
☆82Sep 21, 2023Updated 2 years ago
SAP / software-documentation-data-set-for-machine-translation
View on GitHub
A parallel evaluation data set of SAP software documentation with document structure annotation
☆14Jul 30, 2025Updated 7 months ago
luismsgomes / mosestokenizer
View on GitHub
☆20Oct 22, 2021Updated 4 years ago
cisnlp / simalign
View on GitHub
Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)
☆389Nov 7, 2023Updated 2 years ago
facebookresearch / stopes
View on GitHub
A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB te…
☆297Updated this week
zwhe99 / WMT22-En-Liv
View on GitHub
[WMT 2022] Implementation of TAL-SJTU's system for WMT22 English-Livonian
☆23May 4, 2023Updated 2 years ago
THUNLP-MT / Mask-Align
View on GitHub
Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021
☆61May 10, 2021Updated 4 years ago
salesforce / localization-xml-mt
View on GitHub
A High-Quality Multilingual Dataset for Structured Documentation Translation
☆37May 1, 2025Updated 10 months ago
sortiz / tmxt
View on GitHub
Transform TMX to text
☆28Nov 23, 2022Updated 3 years ago
derekgreene / topicscan
View on GitHub
TopicScan: Visualization and validation interface for NMF Topic Modeling
☆23Jul 23, 2020Updated 5 years ago
sanchit-gandhi / codesnippets
View on GitHub
☆10Apr 3, 2024Updated last year
guijinSON / MM-Eval
View on GitHub
Official implementation for "MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models"
☆18Oct 26, 2024Updated last year
SunbowLiu / PTvsBT
View on GitHub
On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation (Findings of EMNLP 2021))
☆13Nov 21, 2021Updated 4 years ago