michmech / lemmatization-lists
Machine-readable lists of lemma-token pairs in 23 languages.
☆323Updated 2 years ago
Related projects: ⓘ
- All languages stopwords collection☆420Updated 8 months ago
- UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files☆358Updated last week
- Gather modern English word frequencies from all enwiki articles.☆198Updated 6 months ago
- 📂 Additional lookup tables and data resources for spaCy☆98Updated last year
- A modern, interlingual wordnet interface for Python☆207Updated 9 months ago
- English Lemma Database - Compiled by Referencing British National Corpus☆29Updated 2 years ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆139Updated last month
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.☆225Updated last year
- A python module for English lemmatization and inflection.☆258Updated last year
- Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German☆446Updated 3 weeks ago
- Universal Dependencies online documentation☆269Updated this week
- A multilingual parallel corpus created from translations of the Bible.☆172Updated 3 months ago
- Crawler for linguistic corpora☆190Updated 9 months ago
- Sentence aligner☆106Updated 3 years ago
- A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.☆308Updated this week
- Various utilities for processing the data.☆203Updated this week
- Machine-Translation-based sentence alignment tool for parallel text☆295Updated 3 years ago
- Bitextor generates translation memories from multilingual websites☆287Updated 3 months ago
- A tokenizer and sentence splitter for German and English web and social media texts.☆135Updated last month
- A Python Wiktionary Parser☆358Updated 8 months ago
- Compact Language Detector 2☆836Updated 3 years ago
- The Open English WordNet☆459Updated last week
- hand-written dictionaries from the FreeDict project☆388Updated 10 months ago
- A neural parsing pipeline for segmentation, morphological tagging, dependency parsing and lemmatization with pre-trained models for more …☆111Updated 4 months ago
- Modern spell checking library - accurate, fast, multi-language☆605Updated 3 weeks ago
- A cloud-based, open-source system for writing and publishing dictionaries.☆85Updated 8 months ago
- spaCy + UDPipe☆159Updated 2 years ago
- 🎀 JavaScript API for spaCy with Python REST API☆193Updated last year
- Automatically exported from code.google.com/p/universal-pos-tags☆129Updated 2 years ago
- Named Entity Recognition data for Europeana Newspapers☆171Updated last year