uhermjakob / utoken
universal tokenizer
☆16Updated 3 years ago
Alternatives and similar repositories for utoken:
Users that are interested in utoken are comparing it to the libraries listed below
- bilingual dictionary extractor from parallel corpora☆22Updated 10 years ago
- Extracts plain text, language identification and more metadata from WARC records☆21Updated 3 weeks ago
- A python module for word inflections designed for use with spaCy.☆92Updated 5 years ago
- Transform TMX to text☆28Updated 2 years ago
- Bilingual sentence similarity classifier using Tensorflow☆21Updated 5 years ago
- Python Finite-State Toolkit☆53Updated 3 weeks ago
- English Resource Grammar☆20Updated 7 months ago
- Measure the similarity of text corpora for 74 languages☆13Updated last year
- GOPHI: an AMR-to-English Verbalizer☆11Updated 5 years ago
- This packages up data for the Open Multilingual Wordnet☆47Updated last week
- BabelNet (and WordNet) sense embedding trained with Word2Vec and FastText☆10Updated 5 years ago
- Tools for scraping, annotating, and parsing morphological information from Wiktionary☆13Updated 5 years ago
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆63Updated 10 months ago
- Python framework for processing Universal Dependencies data☆55Updated this week
- Framework for creating and accessing UBY resources – sense-linked lexical resources in standard UBY-LMF format☆22Updated 6 years ago
- A Word Sense Disambiguation system integrating implicit and explicit external knowledge.☆68Updated 3 years ago
- A simple configurable tool for manipulating dependency trees.☆13Updated 3 months ago
- English HPSG parser☆51Updated 6 years ago
- Runnable morphological analysis tools from the UniMorph project☆15Updated 6 years ago
- Efficient Low-Memory Aligner☆142Updated 2 months ago
- GC4LM: A Colossal (Biased) language model for German☆13Updated 3 years ago
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆51Updated 3 years ago
- Finds linguistic patterns effortlessly☆35Updated last year
- Efficient teacher-student models and scripts to make them☆50Updated last year
- Ontologies of Linguistic Annotation. Machine-readable tagsets and annotation schemata for more than 100 languages.☆20Updated 3 months ago
- ☆72Updated 3 weeks ago
- Tool to fix bitexts and tag near-duplicates for removal☆30Updated last month
- UFSAC is a resource containing all WordNet Sense Annotated Corpora, and a Java library for manipulating them☆37Updated 2 years ago
- An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For inst…☆22Updated 3 years ago
- Wiktionary parser tool for many language editions.☆54Updated 2 years ago