☆51Jul 25, 2024Updated last year
Alternatives and similar repositories for ersatz
Users that are interested in ersatz are comparing it to the libraries listed below
Sorting:
- Hadoop-based tool for extraction of large scale synchronous grammars for paraphrasing and machine translation☆15Dec 2, 2016Updated 9 years ago
- Library and command line utility to do approximate string matching of a source against a bitext index and get matched source and target.☆51Apr 22, 2025Updated 10 months ago
- Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation☆15Aug 27, 2024Updated last year
- A tool that locates, downloads, and extracts machine translation corpora☆162Sep 18, 2025Updated 5 months ago
- List of corpora annotated for coreference for different languages☆17Aug 8, 2024Updated last year
- ☆34Nov 22, 2021Updated 4 years ago
- phone inventory library☆17May 15, 2023Updated 2 years ago
- Data and scripts for the proper evaluation of cross-lingual embeddings in multiple languages☆15Apr 11, 2020Updated 5 years ago
- Fast, permanent and flexible patterns for sharing and computing on texts with metadata using Apache Arrow.☆15Mar 1, 2022Updated 4 years ago
- ☆12Dec 9, 2015Updated 10 years ago
- Open-Source Machine Translation Quality Estimation in PyTorch☆232Jun 23, 2022Updated 3 years ago
- Efficient Low-Memory Aligner☆146Jan 15, 2025Updated last year
- Statistics on multilingual datasets☆17Jul 12, 2022Updated 3 years ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆160Jun 18, 2024Updated last year
- ☆17Feb 1, 2023Updated 3 years ago
- Project OCELoT: an Open, Collaborative Evaluation Leaderboard of Translations☆23Nov 5, 2025Updated 3 months ago
- ☆25May 11, 2024Updated last year
- Transform TMX to text☆28Nov 23, 2022Updated 3 years ago
- Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"☆23May 26, 2021Updated 4 years ago
- A repository with the code related to experiments around context-aware machine translation☆51Sep 22, 2025Updated 5 months ago
- ☆93Feb 13, 2024Updated 2 years ago
- ☆21Feb 13, 2023Updated 3 years ago
- LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models☆27Aug 11, 2024Updated last year
- An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For inst…☆25Nov 27, 2021Updated 4 years ago
- ☆26Jul 30, 2024Updated last year
- MT Evaluation in Many Languages via Zero-Shot Paraphrasing☆102Jul 25, 2024Updated last year
- Facebook Low Resource (FLoRes) MT Benchmark☆765Nov 20, 2023Updated 2 years ago
- OpusFilter - Parallel corpus processing toolkit☆115Feb 11, 2026Updated 2 weeks ago
- Best Practices in Translation Memory Management☆47Dec 14, 2018Updated 7 years ago
- Transformer based translation quality estimation☆114Jul 20, 2023Updated 2 years ago
- Improved Sentence Alignment in Linear Time and Space☆192Mar 6, 2023Updated 2 years ago
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆389Nov 7, 2023Updated 2 years ago
- A neural word aligner based on multilingual BERT☆373Mar 10, 2022Updated 3 years ago
- Tools for extracting parallel corpora from article titles across languages in Wikipedia☆74Feb 25, 2015Updated 11 years ago
- Library for pruning experts per language pair in NLLB-200☆34Jul 7, 2023Updated 2 years ago
- Machine-Translation-based sentence alignment tool for parallel text☆315Mar 18, 2021Updated 4 years ago
- NTREX -- News Test References for MT Evaluation☆88Jun 5, 2024Updated last year
- Workshop: Using R/tidyverse to analyze & visualize gapminder/processed transcriptomics data!☆13Sep 12, 2025Updated 5 months ago
- Text Normalization utilities for normalizing text for TTS☆21Updated this week