bitextor/bicleaner

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/bitextor/bicleaner)

bitextor / bicleaner

Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.

☆160

Alternatives and similar repositories for bicleaner

Users that are interested in bicleaner are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

bitextor / bifixer
View on GitHub
Tool to fix bitexts and tag near-duplicates for removal
☆35Sep 4, 2025Updated 10 months ago
bitextor / bitextor
View on GitHub
Bitextor generates translation memories from multilingual websites
☆299Nov 11, 2024Updated last year
sortiz / tmxt
View on GitHub
Transform TMX to text
☆27Nov 23, 2022Updated 3 years ago
robertostling / eflomal
View on GitHub
Efficient Low-Memory Aligner
☆148Jan 15, 2025Updated last year
bitextor / bicleaner-ai
View on GitHub
Bicleaner fork that uses neural networks
☆40Feb 23, 2026Updated 4 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
lilt / alignment-scripts
View on GitHub
Scripts to preprocess training and test data and to run fast_align and giza
☆107Nov 2, 2021Updated 4 years ago
Helsinki-NLP / OpusFilter
View on GitHub
OpusFilter - Parallel corpus processing toolkit
☆115Jul 1, 2026Updated 2 weeks ago
hainan-xv / zipporah
View on GitHub
☆42Jul 17, 2018Updated 8 years ago
M4t1ss / SoftAlignments
View on GitHub
Neural macine translation soft alignment visualisations for web and command line
☆73Aug 19, 2021Updated 4 years ago
thammegowda / mtdata
View on GitHub
A tool that locates, downloads, and extracts machine translation corpora
☆165Apr 13, 2026Updated 3 months ago
M4t1ss / parallel-corpora-tools
View on GitHub
Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.
☆42Dec 19, 2023Updated 2 years ago
Prompsit / mutnmt
View on GitHub
An educational tool to train, inspect, evaluate and translate using neural engines
☆20Mar 13, 2025Updated last year
bicici / FDA
View on GitHub
Feature Decay Algorithms
☆11Mar 5, 2014Updated 12 years ago
thompsonb / vecalign
View on GitHub
Improved Sentence Alignment in Linear Time and Space
☆200Jul 4, 2026Updated 2 weeks ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
facebookresearch / mlqe
View on GitHub
We release a dataset based on Wikipedia sentences and the corresponding translations in 6 different languages along with the scores (scal…
☆81Aug 31, 2021Updated 4 years ago
neulab / awesome-align
View on GitHub
A neural word aligner based on multilingual BERT
☆379Mar 10, 2022Updated 4 years ago
bitextor / warc2text
View on GitHub
Extracts plain text, language identification and more metadata from WARC records
☆23Apr 16, 2026Updated 3 months ago
mbanon / fastspell
View on GitHub
Targetted language identifier, based on FastText and Hunspell.
☆38Sep 4, 2025Updated 10 months ago
cisnlp / parcoure
View on GitHub
ParCourE - Parallel Corpus Explorer
☆12Dec 27, 2021Updated 4 years ago
rsennrich / Bleualign
View on GitHub
Machine-Translation-based sentence alignment tool for parallel text
☆316Mar 18, 2021Updated 5 years ago
qe-team / marmot
View on GitHub
MARMOT - the open source framework for feature extraction and machine learning, designed to estimate the quality of Machine Translation o…
☆22Oct 29, 2017Updated 8 years ago
hlt-mt / TMOP
View on GitHub
Translation Memory Open-source Purifier
☆35Nov 6, 2022Updated 3 years ago
MicrosoftTranslator / NTREX
View on GitHub
NTREX -- News Test References for MT Evaluation
☆87Jun 5, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Unbabel / OpenKiwi
View on GitHub
Open-Source Machine Translation Quality Estimation in PyTorch
☆233Jun 23, 2022Updated 4 years ago
paracrawl / extractor
View on GitHub
☆24Nov 29, 2017Updated 8 years ago
hplt-project / data-analytics-tool
View on GitHub
HPLT Analytics
☆15Updated this week
braunefe / Gargantua
View on GitHub
☆12Dec 9, 2015Updated 10 years ago
paracrawl / keops
View on GitHub
Tool for manual evaluation of parallel sentences.
☆15Jan 26, 2026Updated 5 months ago
cisnlp / simalign
View on GitHub
[EMNLP 2020] Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)
☆398Nov 7, 2023Updated 2 years ago
jhclark / tercom
View on GitHub
Translation Error Rate (TER)
☆44May 25, 2018Updated 8 years ago
clab / fast_align
View on GitHub
Simple, fast unsupervised word aligner
☆769Jul 19, 2022Updated 4 years ago
TharinduDR / TransQuest
View on GitHub
Transformer based translation quality estimation
☆114Jul 20, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
marliesvanderwees / dds-nmt
View on GitHub
Dynamic data selection for neural machine translation
☆20Jan 28, 2018Updated 8 years ago
salesforce / localization-xml-mt
View on GitHub
A High-Quality Multilingual Dataset for Structured Documentation Translation
☆39May 1, 2025Updated last year
facebookresearch / stopes
View on GitHub
A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB te…
☆309Updated this week
kpu / preprocess
View on GitHub
Corpus preprocessing
☆100Mar 16, 2024Updated 2 years ago
Unbabel / COMET
View on GitHub
A Neural Framework for MT Evaluation
☆768Apr 21, 2026Updated 3 months ago
browsermt / students
View on GitHub
Efficient teacher-student models and scripts to make them
☆57Dec 16, 2023Updated 2 years ago
facebookresearch / evaluation-of-nmt-bt
View on GitHub
This repository contains additional reference translations for the WMT'14 En-De (newstest2014) and WMT'19 En-Ru (newstest2019) test sets …
☆15Aug 31, 2021Updated 4 years ago