mideind / TokenizerLinks
A tokenizer for Icelandic text.
☆29Updated 3 weeks ago
Alternatives and similar repositories for Tokenizer
Users that are interested in Tokenizer are comparing it to the libraries listed below
Sorting:
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.☆256Updated 3 years ago
- Language independent truecaser in Python.☆159Updated 4 years ago
- Overview of Icelandic NLP resources at a glance☆16Updated last year
- A neural parsing pipeline for segmentation, morphological tagging, dependency parsing and lemmatization with pre-trained models for more …☆115Updated last year
- A Python 3 phonetics library.☆137Updated 5 years ago
- A Python toolkit converting pronunciation in enwiktionary xml dump to cmudict format☆33Updated 6 years ago
- Convert number words (eg. twenty one) to numeric digits (21)☆180Updated 2 years ago
- Cython wrapper on Hunspell Dictionary☆66Updated last year
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆182Updated 7 months ago
- Parse and convert numbers written in French, English, Spanish, Portuguese, German and Catalan into their digit representation.☆112Updated 7 months ago
- Hunspell extension for spaCy 2.0.☆94Updated last year
- spaCy + UDPipe☆165Updated 3 years ago
- Text tokenization and sentence segmentation (segtok v2)☆208Updated 3 years ago
- Compound splitter for German language ("Komposita-Zerlegung") based on large dictionary combined with highly efficient multi-pattern stri…☆34Updated 3 years ago
- Hy-phen-ation made easy☆218Updated this week
- A tokenizer and sentence splitter for German and English web and social media texts.☆150Updated last year
- A lemmatizer for Icelandic text☆17Updated 7 years ago
- Compound splitter for German☆110Updated 5 years ago
- NLTK Contrib☆168Updated last year
- (Official repo for pypi package) Python bindings for the Hunspell spellchecker engine☆190Updated 4 years ago
- Language Acquisition Research Tools☆43Updated last month
- LASER multilingual sentence embeddings as a pip package☆225Updated 2 years ago
- A sentence segmenter that actually works!☆304Updated 5 years ago
- A versioned python wrapper package for cmudict (https://github.com/cmusphinx/cmudict).☆66Updated 2 weeks ago
- 🤘Lemmy is a lemmatizer for Danish 🇩🇰 and Swedish 🇸🇪☆79Updated 4 years ago
- Text and Punctuation correction with Deep Learning☆128Updated 5 years ago
- A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning tech…☆74Updated last year
- UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files☆391Updated last month
- ☆176Updated 9 months ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆155Updated 2 years ago