uhermjakob / wildebeest
Scripts investigate, repair and normalize a wide range of text file problems at the character level.
☆18Updated 2 years ago
Alternatives and similar repositories for wildebeest:
Users that are interested in wildebeest are comparing it to the libraries listed below
- Bilingual sentence similarity classifier using Tensorflow☆20Updated 5 years ago
- Efficient Low-Memory Aligner☆142Updated last month
- Curriculum training☆17Updated this week
- ☆25Updated last year
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆50Updated last month
- NTREX -- News Test References for MT Evaluation☆81Updated 9 months ago
- Curated corpus of parallel data derived from versions of the Bible provided by eBible.org.☆57Updated this week
- ☆71Updated last week
- An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For inst…☆22Updated 3 years ago
- ☆14Updated 4 years ago
- OpusFilter - Parallel corpus processing toolkit☆104Updated last week
- Supplementary material for "When and Why Are Pre-trained Word Embeddings Useful for Neural Machine Translation?" at NAACL 2018☆122Updated 4 years ago
- Library and command line utility to do approximate string matching of a source against a bitext index and get matched source and target.☆48Updated 2 months ago
- Examples, tutorials and use cases for Marian, including our WMT-2017/18 baselines.☆76Updated last year
- A toolkit for producing n-gram language models. The highlights are the implementation of Kneser-Ney growing and revised Kneser pruning me…☆40Updated 6 months ago
- Efficient teacher-student models and scripts to make them☆50Updated last year
- ☆15Updated last year
- ☆22Updated 3 years ago
- Dockerized NMT frameworks for nmt-wizard☆39Updated last year
- ☆23Updated 7 months ago
- Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.☆41Updated last year
- Microsoft Speech Language Translation (MSLT) Corpus☆19Updated 7 years ago
- universal tokenizer☆15Updated 3 years ago
- A set of pipelines for performing experiments on various NLP tasks with a focus on resource-poor/minority languages.☆35Updated this week
- Morfessor EM+Prune☆10Updated 4 years ago
- Multilingual sentence alignment using sentence embeddings☆110Updated 4 months ago
- This dataset contains naturally-occurring English sentences that feature non-trivial noun-verb ambiguity.☆35Updated 5 years ago
- MAGPIE: A sense-annotated corpus of potentially idiomatic expressions☆26Updated 4 years ago
- A tool that locates, downloads, and extracts machine translation corpora☆151Updated this week
- Python framework for processing Universal Dependencies data☆55Updated this week