adbar / simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
☆139Updated last month
Related projects: ⓘ
- Text tokenization and sentence segmentation (segtok v2)☆200Updated 2 years ago
- A tokenizer and sentence splitter for German and English web and social media texts.☆135Updated last month
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.☆225Updated last year
- 📂 Additional lookup tables and data resources for spaCy☆98Updated last year
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆148Updated last year
- A modern, interlingual wordnet interface for Python☆207Updated 9 months ago
- spaCy + UDPipe☆159Updated 2 years ago
- Parse and convert numbers written in French, English or Spanish into their digit representation.☆100Updated last month
- A neural parsing pipeline for segmentation, morphological tagging, dependency parsing and lemmatization with pre-trained models for more …☆111Updated 4 months ago
- Python Finite-State Toolkit☆39Updated last month
- Fuzzy matching and more functionality for spaCy.☆249Updated 2 months ago
- German Morphological Analyzer☆45Updated 2 years ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆149Updated 3 months ago
- Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2…☆65Updated last year
- Sentence aligner☆106Updated 3 years ago
- ✔️Contextual word checker for better suggestions☆405Updated 6 months ago
- OpusFilter - Parallel corpus processing toolkit☆101Updated last month
- 🧪 Cutting-edge experimental spaCy components and features☆94Updated 4 months ago
- CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates☆42Updated last year
- Multilingual sentence alignment using sentence embeddings☆92Updated 9 months ago
- A python module for English lemmatization and inflection.☆258Updated last year
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆151Updated last year
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆92Updated this week
- Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further lang…☆191Updated last year
- A simple collocation-driven recognition of rhymes. Contains pre-trained models for Czech, Dutch, English, French, German, Russian, and Sp…☆28Updated 2 years ago
- A python module for word inflections designed for use with spaCy.☆90Updated 4 years ago
- ☆159Updated 3 months ago
- LASER multilingual sentence embeddings as a pip package☆224Updated last year
- Translation Memory Open-source Purifier☆32Updated last year
- Morphological Dictionaries for German Language☆27Updated 6 years ago