adbar / simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
β154Updated 5 months ago
Alternatives and similar repositories for simplemma:
Users that are interested in simplemma are comparing it to the libraries listed below
- Text tokenization and sentence segmentation (segtok v2)β202Updated 3 years ago
- π Additional lookup tables and data resources for spaCyβ105Updated 3 months ago
- A tokenizer and sentence splitter for German and English web and social media texts.β142Updated 4 months ago
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.β245Updated 2 years ago
- spaCy + UDPipeβ161Updated 3 years ago
- π§ͺ Cutting-edge experimental spaCy components and featuresβ98Updated last year
- A Python library for calculating a large variety of metrics from textβ337Updated 4 months ago
- German Morphological Analyzerβ47Updated 3 years ago
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidataβ161Updated 2 years ago
- This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-sβ¦β214Updated 3 months ago
- Recon NER, Debug and correct annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality β¦β106Updated last year
- Sentence alignerβ112Updated 3 years ago
- Fuzzy matching and more functionality for spaCy.β256Updated 10 months ago
- A neural parsing pipeline for segmentation, morphological tagging, dependency parsing and lemmatization with pre-trained models for more β¦β112Updated 11 months ago
- Language detection using Spacy and Fasttextβ55Updated last year
- Python3 bindings for the Compact Language Detector v3 (CLD3)β151Updated last year
- π₯ Use Hugging Face text and token classification pipelines directly in spaCyβ63Updated last year
- OpusFilter - Parallel corpus processing toolkitβ104Updated last month
- coFR: COreference resolution tool for FRench (and singletons).β24Updated 4 years ago
- Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2β¦β67Updated 2 years ago
- Bilingual term extractorβ53Updated last year
- In the wild extraction of entities that are found using Flair and displayed using a very elegant front-end.β71Updated 2 years ago
- A modern, interlingual wordnet interface for Pythonβ244Updated this week
- Augmenty is an augmentation library based on spaCy for augmenting texts.β153Updated 11 months ago
- An NLP library for Uralic languages such as Finnish, Skolt Sami, Moksha and so on. Also supporting some non-Uralic languages such as Spanβ¦β80Updated 5 months ago
- LASER multilingual sentence embeddings as a pip packageβ223Updated last year
- Python Finite-State Toolkitβ54Updated 2 months ago
- Multilingual sentence alignment using sentence embeddingsβ116Updated 6 months ago
- Tools for shrinking fastText models (in gensim format)β178Updated last year
- Extracts parallel corpora from the 2 raw texts in different languages.β36Updated 2 years ago