paracrawl / keops
Tool for manual evaluation of parallel sentences.
☆14Updated 11 months ago
Related projects: ⓘ
- Tool to fix bitexts and tag near-duplicates for removal☆29Updated last month
- Transform TMX to text☆29Updated last year
- An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For inst…☆19Updated 2 years ago
- ☆13Updated 3 years ago
- Efficient Low-Memory Aligner☆135Updated 2 weeks ago
- NTREX -- News Test References for MT Evaluation☆73Updated 3 months ago
- Data collection, alignment and TAUS repository☆20Updated 6 years ago
- Corset is a web-based data selection portal that helps you getting relevant data from massive amounts of parallel data.☆17Updated 10 months ago
- Sentence aligner☆106Updated 3 years ago
- OpusFilter - Parallel corpus processing toolkit☆101Updated last month
- ☆21Updated 4 years ago
- ☆42Updated 6 years ago
- ☆12Updated 8 years ago
- These are lists for a variety of languages containing words that are distinctive to each language.☆34Updated 2 years ago
- ☆67Updated last month
- A library for data streaming and augmentation☆20Updated 6 months ago
- Bicleaner fork that uses neural networks☆37Updated last month
- MAGPIE: A sense-annotated corpus of potentially idiomatic expressions☆25Updated 4 years ago
- Runnable morphological analysis tools from the UniMorph project☆14Updated 5 years ago
- A minimal, pure Python library to interface with CoNLL-U format files.☆149Updated last year
- Multilingual Open Text☆25Updated 5 months ago
- Automatic extraction of edited sentences from text edition histories.☆80Updated 2 years ago
- A parallel evaluation data set of SAP software documentation with document structure annotation☆10Updated last week
- Curriculum training☆15Updated this week
- Automatically harvested multilingual contrastive word sense disambiguation test sets for machine translation☆16Updated 3 years ago
- Code and data for the IWSLT 2022 shared task on Formality Control for SLT☆21Updated last year
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆148Updated 3 months ago
- ☆17Updated 2 years ago
- Translation Memory Open-source Purifier☆32Updated last year
- Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.☆40Updated 9 months ago