facebookresearch/stopes

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/facebookresearch/stopes)

facebookresearch / stopes

A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.

☆309

Alternatives and similar repositories for stopes

Users that are interested in stopes are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

thammegowda / mtdata
View on GitHub
A tool that locates, downloads, and extracts machine translation corpora
☆165Apr 13, 2026Updated 3 months ago
yilinyang7 / fairseq_multi_fix
View on GitHub
Code and Data release for "Improving Multilingual Translation by Representation and Gradient Regularization" (Yang et al. EMNLP 2021), an…
☆13Aug 12, 2024Updated last year
facebookresearch / flores
View on GitHub
Facebook Low Resource (FLoRes) MT Benchmark
☆771Nov 20, 2023Updated 2 years ago
fe1ixxu / Intra-Distillation
View on GitHub
This is the repository for our EMNLP 2022 paper "The Importance of Being Parameters: An Intra-Distillation Method for Serious Gains".
☆10Jun 2, 2023Updated 3 years ago
facebookresearch / fairseq2
View on GitHub
FAIR Sequence Modeling Toolkit 2
☆1,142Updated this week
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
ymoslem / MT-Tools
View on GitHub
Collection of Common Machine Translation Tools
☆11Jul 26, 2022Updated 3 years ago
Helsinki-NLP / OpusFilter
View on GitHub
OpusFilter - Parallel corpus processing toolkit
☆115Jul 1, 2026Updated 2 weeks ago
bitextor / bicleaner
View on GitHub
Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.
☆160Jun 18, 2024Updated 2 years ago
MicrosoftTranslator / NTREX
View on GitHub
NTREX -- News Test References for MT Evaluation
☆87Jun 5, 2024Updated 2 years ago
cisnlp / simalign
View on GitHub
[EMNLP 2020] Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)
☆398Nov 7, 2023Updated 2 years ago
uds-lsv / afro-maft
View on GitHub
☆17Jan 12, 2023Updated 3 years ago
facebookresearch / SONAR
View on GitHub
SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.
☆898Oct 10, 2025Updated 9 months ago
ZurichNLP / ContraDecode
View on GitHub
The implementation of "Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Deco…
☆38Aug 29, 2025Updated 10 months ago
google-research / mt-metrics-eval
View on GitHub
Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.
☆132Apr 23, 2026Updated 2 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
facebookresearch / LASER
View on GitHub
Language-Agnostic SEntence Representations
☆3,661May 2, 2024Updated 2 years ago
pluiez / NLLB-inference
View on GitHub
☆56Jul 16, 2022Updated 4 years ago
Unbabel / COMET
View on GitHub
A Neural Framework for MT Evaluation
☆768Apr 21, 2026Updated 3 months ago
cisnlp / Glot500
View on GitHub
[ACL 2023] Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
☆107Apr 14, 2026Updated 3 months ago
hsing-wang / Awesome-LLM-MT
View on GitHub
☆254May 30, 2024Updated 2 years ago
bzhangGo / zero
View on GitHub
Zero -- A neural machine translation system
☆152May 8, 2023Updated 3 years ago
neulab / contextual-mt
View on GitHub
A repository with the code related to experiments around context-aware machine translation
☆51Sep 22, 2025Updated 9 months ago
THUNLP-MT / Template-NMT
View on GitHub
☆23Nov 15, 2022Updated 3 years ago
openlanguagedata / seed
View on GitHub
Seed Machine Translation Data
☆34Nov 12, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
thompsonb / vecalign
View on GitHub
Improved Sentence Alignment in Linear Time and Space
☆200Jul 4, 2026Updated 2 weeks ago
cindyxinyiwang / expand-via-lexicon-based-adaptation
View on GitHub
Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"
☆29Apr 2, 2022Updated 4 years ago
juletx / self-translate
View on GitHub
Do Multilingual Language Models Think Better in English?
☆42Aug 3, 2023Updated 2 years ago
google-research / bleurt
View on GitHub
BLEURT is a metric for Natural Language Generation based on transfer learning.
☆794Aug 4, 2023Updated 2 years ago
fe1ixxu / ALMA
View on GitHub
State-of-the-art LLM-based translation models.
☆590Apr 9, 2025Updated last year
google-research-datasets / TF-IDF-IIF-top100-wordlists
View on GitHub
These are lists for a variety of languages containing words that are distinctive to each language.
☆42Apr 5, 2022Updated 4 years ago
robertostling / eflomal
View on GitHub
Efficient Low-Memory Aligner
☆148Jan 15, 2025Updated last year
wxjiao / ParroT
View on GitHub
The ParroT framework to enhance and regulate the Translation Abilities during Chat based on open-sourced LLMs (e.g., LLaMA-7b, Bloomz-7b1…
☆177Dec 31, 2024Updated last year
xuchennlp / S2T
View on GitHub
The project for speech translation
☆12Sep 28, 2023Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
facebookresearch / covost
View on GitHub
CoVoST: A Large-Scale Multilingual Speech-To-Text Translation Corpus (CC0 Licensed)
☆401Sep 14, 2021Updated 4 years ago
gpengzhi / CrossConST-MT
View on GitHub
Code for Findings of ACL 2023 paper "Improving Zero-shot Multilingual Neural Machine Translation by Leveraging Cross-lingual Consistency …
☆10Jul 18, 2023Updated 3 years ago
facebookresearch / SimulEval
View on GitHub
SimulEval: A General Evaluation Toolkit for Simultaneous Translation
☆126Sep 13, 2024Updated last year
ictnlp / FA-DAT
View on GitHub
Official Implementation for the ICLR2023 paper "Fuzzy Alignments in Directed Acyclic Graph for Non-autoregressive Machine Translation"
☆14Mar 1, 2023Updated 3 years ago
sunzewei2715 / Graformer
View on GitHub
The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models
☆24Sep 22, 2021Updated 4 years ago
fyvo / WMT-Biomed-Test
View on GitHub
☆13Aug 23, 2024Updated last year
asappresearch / wav2seq
View on GitHub
Official code for Wav2Seq
☆97Jul 19, 2022Updated 4 years ago