amir-zeldes / RFTokenizer
A character-wise tokenizer for morphologically rich languages
☆27Updated this week
Alternatives and similar repositories for RFTokenizer:
Users that are interested in RFTokenizer are comparing it to the libraries listed below
- A neural network that jointly part-of-speech tags and lemmatizes sentences, boosting accuracy for morphologically-rich languages (Czech, …☆34Updated 5 years ago
- Runnable morphological analysis tools from the UniMorph project☆15Updated 6 years ago
- Python framework for processing Universal Dependencies data☆55Updated this week
- German Morphological Analyzer☆47Updated 3 years ago
- A Corpus Data Retrieval Index using Lucene for Look-Ups☆17Updated this week
- A cloud-based, open-source system for writing and publishing dictionaries.☆89Updated last year
- A fully-fledge PyTorch package for Morphological Analysis, tailored to morphologically rich and historical languages.☆23Updated last year
- A list of resources for conservation, development, and documentation of endangered, minority, and low or under-resourced human languages.☆34Updated last year
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆41Updated last year
- A part-of-speech tagger with support for domain adaptation and external resources.☆22Updated 2 years ago
- The NLG tool for Finnish☆22Updated last year
- ☆14Updated 2 years ago
- Wiktionary parser tool for many language editions.☆54Updated 2 years ago
- A neural parsing pipeline for segmentation, morphological tagging, dependency parsing and lemmatization with pre-trained models for more …☆112Updated 10 months ago
- A tool for automatic spelling normalization☆20Updated 4 years ago
- ☆63Updated 9 months ago
- Efficient Low-Memory Aligner☆142Updated last month
- ☆46Updated 7 months ago
- An NLP pipeline for Hebrew☆36Updated this week
- ANNIS is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with…☆75Updated last month
- Multi Tier Annotation Search☆26Updated 3 years ago
- Natural language processing resources for multiple languages, with an eye towards use for digital humanities.☆126Updated 3 years ago
- A tool for text normalisation via character-level machine translation☆13Updated 4 years ago
- An advanced, extensible web front-end for the Manatee-open corpus search engine☆64Updated this week
- ☆19Updated 3 years ago
- English web corpus with 4M tokens and several annotation types☆26Updated last year
- Compiled tools, datasets, and other resources for historical text normalization.☆18Updated 5 years ago
- Master repo for the UniMorph project, includes the UniMorph schema and annotated data files☆27Updated 5 years ago
- Python Finite-State Toolkit☆53Updated 2 weeks ago
- ☆26Updated 2 years ago