google-research/mt-metrics-eval

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google-research/mt-metrics-eval)

google-research / mt-metrics-eval

Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.

☆132

Alternatives and similar repositories for mt-metrics-eval

Users that are interested in mt-metrics-eval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

google / wmt-mqm-human-evaluation
View on GitHub
☆100Sep 25, 2025Updated 10 months ago
Unbabel / COMET
View on GitHub
A Neural Framework for MT Evaluation
☆770Apr 21, 2026Updated 3 months ago
Coldmist-Lu / MQM_APE
View on GitHub
[MQM-APE] Toward High-Quality Error Annotation Predictors with Automatic Post-Editing in LLM Translation Evaluators.
☆12Sep 24, 2024Updated last year
wmt-conference / wmt22-news-systems
View on GitHub
☆21Feb 13, 2023Updated 3 years ago
MicrosoftTranslator / GEMBA
View on GitHub
GEMBA — GPT Estimation Metric Based Assessment
☆152Dec 15, 2025Updated 7 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
MicrosoftTranslator / ToShipOrNotToShip
View on GitHub
☆19Dec 16, 2024Updated last year
google-research / metricx
View on GitHub
☆146Jul 2, 2026Updated 3 weeks ago
AppraiseDev / Appraise
View on GitHub
Appraise code used as part of WMT21 human evaluation campaign
☆30Jul 15, 2026Updated last week
zouharvi / subset2evaluate
View on GitHub
Find informative examples to efficiently (human)-evaluate NLG models.
☆17Apr 22, 2026Updated 3 months ago
Smu-Tan / Remedy
View on GitHub
[EMNLP2025] Remedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling
☆16Nov 20, 2025Updated 8 months ago
dayeonki / mt_feedback
View on GitHub
Code for "Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations" [NAACL Findings 2024]
☆14Apr 3, 2026Updated 3 months ago
sheffieldnlp / mlqe-pe
View on GitHub
Multilingual Quality Estimation and Automatic Post-editing Dataset
☆44Mar 24, 2022Updated 4 years ago
marzenakrp / LiteraryTranslation
View on GitHub
☆24Apr 2, 2024Updated 2 years ago
wmt-conference / wmt25-general-mt
View on GitHub
☆17Nov 19, 2025Updated 8 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
marzenakrp / demetr
View on GitHub
Repository for DEMETR: Diagnosing Evaluation Metrics for Translation
☆17Nov 29, 2022Updated 3 years ago
thammegowda / mtdata
View on GitHub
A tool that locates, downloads, and extracts machine translation corpora
☆167Apr 13, 2026Updated 3 months ago
neulab / contextual-mt
View on GitHub
A repository with the code related to experiments around context-aware machine translation
☆51Sep 22, 2025Updated 10 months ago
Coldmist-Lu / ErrorAnalysis_Prompt
View on GitHub
[ChatGPT4MTevaluation] ErrorAnalysis Prompt for MT Evaluation in ChatGPT
☆91Oct 14, 2025Updated 9 months ago
hsing-wang / Awesome-LLM-MT
View on GitHub
☆254May 30, 2024Updated 2 years ago
MicrosoftTranslator / NTREX
View on GitHub
NTREX -- News Test References for MT Evaluation
☆87Jun 5, 2024Updated 2 years ago
wmt-conference / wmt-format-tools
View on GitHub
Tools for formatting WMT hypothesis and test sets in XML
☆27Apr 18, 2025Updated last year
AIPHES / ACL20-Reference-Free-MT-Evaluation
View on GitHub
Reference-free MT Evaluation Metrics
☆20Sep 24, 2022Updated 3 years ago
lucadiliello / bleurt-pytorch
View on GitHub
BLEURT implementation in PyTorch
☆38Jan 19, 2023Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
rbawden / discourse-mt-test-sets
View on GitHub
☆29Jun 10, 2024Updated 2 years ago
rbawden / mt-bigscience
View on GitHub
Evaluation results for Machine Translation within the BigScience project
☆11May 15, 2023Updated 3 years ago
lilt / alignment-scripts
View on GitHub
Scripts to preprocess training and test data and to run fast_align and giza
☆107Nov 2, 2021Updated 4 years ago
ZurichNLP / coverage-contrastive-conditioning
View on GitHub
Data and code accompanying the paper "As Little as Possible, as Much as Necessary: Detecting Over- and Undertranslations with Contrastive…
☆22Apr 13, 2023Updated 3 years ago
JDEA-NLP / Vega-MT
View on GitHub
[WMT 2022 champion system] Vega-MT model and inference scripts
☆41Feb 10, 2023Updated 3 years ago
zerocstaker / constrained_ape
View on GitHub
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
☆12Oct 10, 2020Updated 5 years ago
deep-spin / tower-eval
View on GitHub
☆29Nov 14, 2025Updated 8 months ago
masakhane-io / africomet
View on GitHub
COMET for African languages
☆11Jan 24, 2025Updated last year
ymoslem / MT-Tools
View on GitHub
Collection of Common Machine Translation Tools
☆11Jul 26, 2022Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Unbabel / OpenKiwi
View on GitHub
Open-Source Machine Translation Quality Estimation in PyTorch
☆233Jun 23, 2022Updated 4 years ago
xu1998hz / InstructScore_SEScore3
View on GitHub
First explanation metric (diagnostic report) for text generation evaluation
☆62Mar 3, 2025Updated last year
fyvo / WMT-Biomed-Test
View on GitHub
☆13Aug 23, 2024Updated last year
Unbabel / MT-Telescope
View on GitHub
☆33Nov 22, 2021Updated 4 years ago
google-research / bleurt
View on GitHub
BLEURT is a metric for Natural Language Generation based on transfer learning.
☆794Aug 4, 2023Updated 2 years ago
marian-nmt / sotastream
View on GitHub
A library for data streaming and augmentation
☆22May 5, 2025Updated last year
katherinethai / par3
View on GitHub
☆29Dec 2, 2024Updated last year