michmech / lemmatization-lists
Machine-readable lists of lemma-token pairs in 23 languages.
☆335Updated 3 years ago
Alternatives and similar repositories for lemmatization-lists:
Users that are interested in lemmatization-lists are comparing it to the libraries listed below
- English Lemma Database - Compiled by Referencing British National Corpus☆29Updated 4 months ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆151Updated 3 months ago
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.☆238Updated 2 years ago
- A multilingual parallel corpus created from translations of the Bible.☆177Updated 5 months ago
- Named Entity Recognition data for Europeana Newspapers☆171Updated last year
- A python module for English lemmatization and inflection.☆265Updated last year
- 🎀 JavaScript API for spaCy with Python REST API☆196Updated last year
- A modern, interlingual wordnet interface for Python☆232Updated 2 weeks ago
- Automatically exported from code.google.com/p/universal-pos-tags☆129Updated 2 years ago
- Universal Dependencies online documentation☆281Updated this week
- WordNet in JSON format.☆90Updated 4 years ago
- ConllEditor is a tool to edit dependency syntax trees in CoNLL-U format.☆55Updated 2 months ago
- 📂 Additional lookup tables and data resources for spaCy☆101Updated 3 weeks ago
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆97Updated this week
- Bitextor generates translation memories from multilingual websites☆293Updated 3 months ago
- A tokenizer and sentence splitter for German and English web and social media texts.☆138Updated 2 months ago
- Sentence aligner☆109Updated 3 years ago
- A Corpus Data Retrieval Index using Lucene for Look-Ups☆17Updated this week
- Gather modern English word frequencies from all enwiki articles.☆211Updated 11 months ago
- General-Purpose Neural Networks for Sentence Boundary Detection☆72Updated last year
- 💫 REST microservices for various spaCy-related tasks☆240Updated 2 years ago
- UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files☆375Updated 2 months ago
- Various utilities for processing the data.☆207Updated this week
- Text tokenization and sentence segmentation (segtok v2)☆201Updated 2 years ago
- ✔️Contextual word checker for better suggestions (not actively maintained)☆413Updated 3 weeks ago
- Offline database of synonyms/thesaurus☆191Updated last year
- Index Common Crawl archives in tabular format☆110Updated 3 months ago
- A command-line tool for using CommonCrawl Index API at http://index.commoncrawl.org/☆188Updated 6 years ago
- Hunspell extension for spaCy 2.0.☆94Updated 6 months ago
- German Morphological Analyzer☆47Updated 3 years ago