uhermjakob / utokenLinks
universal tokenizer
☆17Updated 3 years ago
Alternatives and similar repositories for utoken
Users that are interested in utoken are comparing it to the libraries listed below
Sorting:
- This packages up data for the Open Multilingual Wordnet☆50Updated 2 months ago
- A set of pipelines for performing experiments on various NLP tasks with a focus on resource-poor/minority languages.☆36Updated this week
- Character-level conversion between Hebrew text and Latin transliteration using deep learning - a demonstration of seq2seq training.☆14Updated 2 years ago
- Python framework for processing Universal Dependencies data☆58Updated 2 weeks ago
- Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)☆30Updated last month
- A python module for word inflections designed for use with spaCy.☆92Updated 5 years ago
- Bilingual sentence similarity classifier using Tensorflow☆23Updated 5 years ago
- Curated corpus of parallel data derived from versions of the Bible provided by eBible.org.☆69Updated 2 months ago
- an experimental implementation of Burrow's delta in Python 3☆21Updated 3 years ago
- Master repo for the UniMorph project, includes the UniMorph schema and annotated data files☆32Updated 5 years ago
- An advanced, extensible web front-end for the Manatee-open corpus search engine☆73Updated this week
- A sentence segmentation library with wide language support optimized for speed and utility.☆66Updated last month
- Transform TMX to text☆27Updated 2 years ago
- Efficient teacher-student models and scripts to make them☆51Updated last year
- OpusFilter - Parallel corpus processing toolkit☆109Updated this week
- Efficient Low-Memory Aligner☆146Updated 6 months ago
- ☆74Updated 4 months ago
- A python true casing utility that restores case information for texts☆89Updated 2 years ago
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆51Updated last month
- Python Finite-State Toolkit☆57Updated last week
- An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For inst…☆22Updated 3 years ago
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆47Updated 2 years ago
- downloads and parses subtitle dataset from opensubtitles.org☆16Updated last year
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆168Updated 2 months ago
- UFSAC is a resource containing all WordNet Sense Annotated Corpora, and a Java library for manipulating them☆38Updated 3 years ago
- English Resource Grammar☆21Updated 2 weeks ago
- Sentence aligner☆116Updated 4 years ago
- A simple configurable tool for manipulating dependency trees.☆14Updated 7 months ago
- Text tokenization and sentence segmentation (segtok v2)☆205Updated 3 years ago
- ☆65Updated last year