Helsinki-NLP/OpusTools

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Helsinki-NLP/OpusTools)

Helsinki-NLP / OpusTools

☆83

Alternatives and similar repositories for OpusTools

Users that are interested in OpusTools are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Helsinki-NLP / OpusFilter
View on GitHub
OpusFilter - Parallel corpus processing toolkit
☆115Jul 1, 2026Updated 2 weeks ago
fyvo / WMT-Biomed-Test
View on GitHub
☆13Aug 23, 2024Updated last year
sortiz / tmxt
View on GitHub
Transform TMX to text
☆27Nov 23, 2022Updated 3 years ago
dayeonki / mt_feedback
View on GitHub
Code for "Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations" [NAACL Findings 2024]
☆14Apr 3, 2026Updated 3 months ago
openlanguagedata / seed
View on GitHub
Seed Machine Translation Data
☆34Nov 12, 2024Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
Helsinki-NLP / FoTraNMT
View on GitHub
Open Source Neural Machine Translation in PyTorch
☆13Apr 29, 2023Updated 3 years ago
Helsinki-NLP / OPUS
View on GitHub
The Open Parallel Corpus
☆89Updated this week
thammegowda / mtdata
View on GitHub
A tool that locates, downloads, and extracts machine translation corpora
☆165Apr 13, 2026Updated 3 months ago
bitextor / bifixer
View on GitHub
Tool to fix bitexts and tag near-duplicates for removal
☆35Sep 4, 2025Updated 10 months ago
bitextor / bicleaner
View on GitHub
Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.
☆160Jun 18, 2024Updated 2 years ago
paracrawl / keops
View on GitHub
Tool for manual evaluation of parallel sentences.
☆15Jan 26, 2026Updated 5 months ago
paracrawl / corset
View on GitHub
Corset is a web-based data selection portal that helps you getting relevant data from massive amounts of parallel data.
☆21Nov 6, 2023Updated 2 years ago
hangyav / UnsupPSE
View on GitHub
Unsupervised parallel sentence extraction from comparable corpora
☆16Aug 6, 2019Updated 6 years ago
ymoslem / OpenNMT-Web-Interface
View on GitHub
Machine Translation Web Interface for OpenNMT-py
☆26Dec 24, 2021Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
TharinduDR / TransQuest
View on GitHub
Transformer based translation quality estimation
☆114Jul 20, 2023Updated 3 years ago
Kartikaggarwal98 / Indian_ParallelCorpus
View on GitHub
Curated list of publicly available parallel corpus for Indian Languages
☆36Jul 15, 2021Updated 5 years ago
bitextor / bitextor
View on GitHub
Bitextor generates translation memories from multilingual websites
☆299Nov 11, 2024Updated last year
hplt-project / OpusCleaner
View on GitHub
OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.
☆58Feb 3, 2026Updated 5 months ago
MaLA-LM / GlotEval
View on GitHub
GlotEval: a unified evaluation toolkit designed to benchmark multilingual Large Language Models (LLMs) in a language-specific way
☆18Nov 4, 2025Updated 8 months ago
neulab / awesome-align
View on GitHub
A neural word aligner based on multilingual BERT
☆379Mar 10, 2022Updated 4 years ago
tsuruoka-lab / AMI-Meeting-Parallel-Corpus
View on GitHub
AMI Meeting Parallel Corpus
☆13Dec 11, 2020Updated 5 years ago
NLP-Playground / LaSS
View on GitHub
☆31Apr 27, 2022Updated 4 years ago
robertostling / eflomal
View on GitHub
Efficient Low-Memory Aligner
☆148Jan 15, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
wmt-conference / wmt21-news-systems
View on GitHub
☆26Jan 9, 2023Updated 3 years ago
ymoslem / MT-Tools
View on GitHub
Collection of Common Machine Translation Tools
☆11Jul 26, 2022Updated 3 years ago
paul-shen-stanford / ai-grammar-style-api
View on GitHub
Multilingual AI style enhancement and grammar correction REST API. English, French, Spanish, Arabic, Japanese, Chinese. Based on deep NLP…
☆10Dec 22, 2019Updated 6 years ago
olastor / german-word-frequencies
View on GitHub
Simple word to frequency mappings for the german language based on text corpora and using CISTEM stemmer.
☆14Apr 3, 2021Updated 5 years ago
OpenITI / RELEASE
View on GitHub
OpenITI releases
☆61Feb 9, 2026Updated 5 months ago
anoopkunchukuttan / multinmt_tutorial_coling2020
View on GitHub
Material for the COLING 2020 Tutorial on Multilingual NMT
☆16Dec 10, 2020Updated 5 years ago
Helsinki-NLP / OPUS-translator
View on GitHub
Translation demonstrator
☆37May 12, 2020Updated 6 years ago
wmt-conference / wmt-format-tools
View on GitHub
Tools for formatting WMT hypothesis and test sets in XML
☆27Apr 18, 2025Updated last year
j-min / WikiExtractor_To_the_one_text
View on GitHub
Simple extension of WikiExtractor(https://github.com/attardi/wikiextractor)
☆16Dec 23, 2016Updated 9 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
hplt-project / sacremoses
View on GitHub
Python port of Moses tokenizer, truecaser and normalizer
☆497Feb 6, 2026Updated 5 months ago
facebookresearch / mlqe
View on GitHub
We release a dataset based on Wikipedia sentences and the corresponding translations in 6 different languages along with the scores (scal…
☆81Aug 31, 2021Updated 4 years ago
NLP2CT / Meta-Curriculum
View on GitHub
Meta-Curriculum Learning for Domain Adaptation in Neural Machine Translation (AAAI 2021)
☆26Jun 18, 2022Updated 4 years ago
Caucasus-Rosetta / Lingua-Corpus
View on GitHub
Caucasus languages focused multilingual and monolingual corpuses for Natural Language Processing(NLP)
☆37Updated this week
ZacharySBrown / deep-learning-nlp-sais
View on GitHub
☆17Apr 24, 2019Updated 7 years ago
bltlab / paranames
View on GitHub
ParaNames: A multilingual resource for parallel names
☆40May 20, 2024Updated 2 years ago
deterministic-algorithms-lab / Jax-Journey
View on GitHub
A pathway and collection of resources to learning Jax from beginning to advance.
☆11Jan 2, 2021Updated 5 years ago