hplt-project/sacremoses

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/hplt-project/sacremoses)

hplt-project / sacremoses

Python port of Moses tokenizer, truecaser and normalizer

☆497

Alternatives and similar repositories for sacremoses

Users that are interested in sacremoses are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mjpost / sacrebleu
View on GitHub
Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons
☆1,254Jul 17, 2026Updated last week
rsennrich / subword-nmt
View on GitHub
Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
☆2,271Aug 7, 2024Updated last year
glample / fastBPE
View on GitHub
Fast BPE
☆677Jun 18, 2024Updated 2 years ago
clab / fast_align
View on GitHub
Simple, fast unsupervised word aligner
☆769Jul 19, 2022Updated 4 years ago
moses-smt / mosesdecoder
View on GitHub
Moses, the machine translation system
☆1,625Mar 28, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
thammegowda / mtdata
View on GitHub
A tool that locates, downloads, and extracts machine translation corpora
☆167Apr 13, 2026Updated 3 months ago
lilt / alignment-scripts
View on GitHub
Scripts to preprocess training and test data and to run fast_align and giza
☆107Nov 2, 2021Updated 4 years ago
Unbabel / OpenKiwi
View on GitHub
Open-Source Machine Translation Quality Estimation in PyTorch
☆233Jun 23, 2022Updated 4 years ago
neulab / compare-mt
View on GitHub
A tool for holistic analysis of language generations systems
☆471Sep 22, 2025Updated 10 months ago
bitextor / bitextor
View on GitHub
Bitextor generates translation memories from multilingual websites
☆299Nov 11, 2024Updated last year
marian-nmt / marian
View on GitHub
Fast Neural Machine Translation in C++
☆1,462Aug 25, 2023Updated 2 years ago
luismsgomes / mosestokenizer
View on GitHub
☆20Oct 22, 2021Updated 4 years ago
facebookresearch / mlqe
View on GitHub
We release a dataset based on Wikipedia sentences and the corresponding translations in 6 different languages along with the scores (scal…
☆81Aug 31, 2021Updated 4 years ago
facebookresearch / LASER
View on GitHub
Language-Agnostic SEntence Representations
☆3,661May 2, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
neulab / awesome-align
View on GitHub
A neural word aligner based on multilingual BERT
☆379Mar 10, 2022Updated 4 years ago
EdinburghNLP / nematus
View on GitHub
Open-Source Neural Machine Translation in Tensorflow
☆805Dec 9, 2022Updated 3 years ago
Helsinki-NLP / OpusFilter
View on GitHub
OpusFilter - Parallel corpus processing toolkit
☆115Jul 1, 2026Updated 3 weeks ago
cisnlp / simalign
View on GitHub
[EMNLP 2020] Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)
☆398Nov 7, 2023Updated 2 years ago
facebookresearch / XLM
View on GitHub
PyTorch original implementation of Cross-lingual Language Model Pretraining.
☆2,923Feb 14, 2023Updated 3 years ago
facebookresearch / flores
View on GitHub
Facebook Low Resource (FLoRes) MT Benchmark
☆771Nov 20, 2023Updated 2 years ago
THUNLP-MT / MT-Reading-List
View on GitHub
A machine translation reading list maintained by Tsinghua Natural Language Processing Group
☆2,435Aug 9, 2024Updated last year
thompsonb / prism
View on GitHub
MT Evaluation in Many Languages via Zero-Shot Paraphrasing
☆102Jul 25, 2024Updated 2 years ago
mingruimingrui / fast-mosestokenizer
View on GitHub
c++ mosestokenizer
☆18Mar 13, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
moses-smt / mgiza
View on GitHub
A word alignment tool based on famous GIZA++, extended to support multi-threading, resume training and incremental training.
☆167May 12, 2021Updated 5 years ago
google / sentencepiece
View on GitHub
Unsupervised text tokenizer for Neural Network-based text generation.
☆11,983Updated this week
facebookresearch / vizseq
View on GitHub
An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)
☆452Jul 17, 2026Updated last week
google-research / bleurt
View on GitHub
BLEURT is a metric for Natural Language Generation based on transfer learning.
☆794Aug 4, 2023Updated 2 years ago
bitextor / bicleaner
View on GitHub
Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.
☆160Jun 18, 2024Updated 2 years ago
joeynmt / joeynmt
View on GitHub
Minimalist NMT for educational purposes
☆710Jan 29, 2024Updated 2 years ago
artetxem / monoses
View on GitHub
Unsupervised Statistical Machine Translation
☆232Aug 30, 2020Updated 5 years ago
rsennrich / wmt16-scripts
View on GitHub
scripts and configuration files for Edinburgh neural MT submission to WMT 16 shared translation task
☆139Nov 5, 2020Updated 5 years ago
OpenNMT / OpenNMT-py
View on GitHub
Open Source Neural Machine Translation and (Large) Language Models in PyTorch
☆7,010Oct 14, 2025Updated 9 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
nreimers / truecaser
View on GitHub
Language independent truecaser in Python.
☆160Oct 17, 2021Updated 4 years ago
EdinburghNLP / opus-100-corpus
View on GitHub
☆93Feb 13, 2024Updated 2 years ago
awslabs / sockeye
View on GitHub
Sequence-to-sequence framework with a focus on Neural Machine Translation based on PyTorch
☆1,215Oct 24, 2024Updated last year
jhclark / multeval
View on GitHub
Easy Bootstrap Resampling and Approximate Randomization for BLEU, METEOR, and TER using Multiple Optimizer Runs. This implements "Better …
☆205Feb 25, 2023Updated 3 years ago
rbawden / discourse-mt-test-sets
View on GitHub
☆29Jun 10, 2024Updated 2 years ago
artetxem / vecmap
View on GitHub
A framework to learn cross-lingual word embedding mappings
☆656Apr 22, 2023Updated 3 years ago
neulab / word-embeddings-for-nmt
View on GitHub
Supplementary material for "When and Why Are Pre-trained Word Embeddings Useful for Neural Machine Translation?" at NAACL 2018
☆123Sep 22, 2025Updated 10 months ago