uhermjakob / wildebeestLinks
Scripts investigate, repair and normalize a wide range of text file problems at the character level.
☆20Updated 2 years ago
Alternatives and similar repositories for wildebeest
Users that are interested in wildebeest are comparing it to the libraries listed below
Sorting:
- Translation demonstrator☆34Updated 5 years ago
- universal tokenizer☆16Updated 3 years ago
- Bilingual sentence similarity classifier using Tensorflow☆24Updated 6 years ago
- Library and command line utility to do approximate string matching of a source against a bitext index and get matched source and target.☆51Updated 6 months ago
- Efficient teacher-student models and scripts to make them☆52Updated last year
- A set of pipelines for performing experiments on various NLP tasks with a focus on resource-poor/minority languages.☆36Updated last week
- OpusFilter - Parallel corpus processing toolkit☆110Updated 3 weeks ago
- ☆31Updated last year
- Multilingual sentence alignment using sentence embeddings☆126Updated 11 months ago
- Character-level conversion between Hebrew text and Latin transliteration using deep learning - a demonstration of seq2seq training.☆14Updated 2 years ago
- A python module for word inflections designed for use with spaCy.☆93Updated 5 years ago
- Improved Sentence Alignment in Linear Time and Space☆184Updated 2 years ago
- Curated list of open source and openly accessible large language models☆26Updated 2 years ago
- Efficient Low-Memory Aligner☆146Updated 9 months ago
- Suite for phonetic word embeddings, especially their evaluation and baseline models.☆34Updated 7 months ago
- Microsoft Speech Language Translation (MSLT) Corpus☆19Updated 8 years ago
- OPUS-CAT is a collection of software which make it possible to OPUS-MT neural machine translation models in professional translation. OPU…☆80Updated 8 months ago
- Experiments with Hugging Face 🔬 🤗☆44Updated last year
- A TextTiling-based algorithm for text segmentation (aka topic segmentation) that uses neural sentence encoders, as well as extractive sum…☆50Updated 2 years ago
- A web application that interfaces two GEC systems. [web instance is down]☆32Updated last year
- Open information and community for machine translation☆80Updated last week
- web based editor for subtitles and transcripts☆141Updated last year
- Text tokenization and sentence segmentation (segtok v2)☆206Updated 3 years ago
- Curriculum training☆18Updated 3 months ago
- A tokenizer and sentence splitter for German and English web and social media texts.☆147Updated 10 months ago
- A guide to building language technology in new languages.☆59Updated 3 years ago
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆52Updated 2 weeks ago
- A python true casing utility that restores case information for texts☆89Updated 2 years ago
- Fast Neural Machine Translation in C++ - development repository☆21Updated last year
- An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For inst…☆25Updated 3 years ago