Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
☆186Jun 6, 2025Updated 8 months ago
Alternatives and similar repositories for simplemma
Users that are interested in simplemma are comparing it to the libraries listed below
Sorting:
- ANYKS Spell-Checker☆19Jan 3, 2023Updated 3 years ago
- The Wikinflection Corpus, from the paper "Wikinflection Corpus: A (Better) Multilingual, Morpheme-Annotated Inflectional Corpus" (Metheni…☆12Dec 15, 2023Updated 2 years ago
- The most accurate natural language detection library for Python, suitable for short text and mixed-language text☆1,639Nov 21, 2025Updated 3 months ago
- A python module and REST API for automatic extraction of metadata from PDF files☆18Nov 11, 2024Updated last year
- A list of awesome open source projects in the machine learning field, who's developers are mainly based in Germany☆53Sep 10, 2024Updated last year
- A Python scraping module, that extracts text from articles found in RSS feeds. Uses SQLite as database.☆20Jul 5, 2024Updated last year
- Compare accuracies of udpipe models and spacy models which can be used for NLP annotation☆14Feb 11, 2018Updated 8 years ago
- Searching in-memory corpus with Corpus Query Language (CQL)☆19Dec 2, 2024Updated last year
- Fast and robust date extraction from web pages, with Python or on the command-line☆145Nov 4, 2025Updated 3 months ago
- Faster, modernized fork of the language identification tool langid.py☆60Nov 22, 2024Updated last year
- 📂 Additional lookup tables and data resources for spaCy☆113Jun 4, 2025Updated 8 months ago
- Python port for IWNLP.Lemmatizer☆18Oct 18, 2023Updated 2 years ago
- 🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.☆904Aug 20, 2024Updated last year
- Fuzzy matching and more functionality for spaCy.☆259Jul 6, 2024Updated last year
- Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing☆76Jan 22, 2026Updated last month
- A python module for English lemmatization and inflection.☆273Sep 14, 2023Updated 2 years ago
- Next-generation Punkt sentence boundary detection with zero dependencies☆29Nov 18, 2025Updated 3 months ago
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆159Dec 19, 2025Updated 2 months ago
- Database for experiments with russian voxforge audio data (http://voxforge.org/ru/downloads).☆14Aug 31, 2021Updated 4 years ago
- Question generation from text☆15Sep 19, 2012Updated 13 years ago
- A reddit bot that finds original publish dates on linked articles.☆10Nov 30, 2024Updated last year
- Morphological analyzer / inflection engine for Russian and Ukrainian languages. Fork of https://github.com/pymorphy2/pymorphy2☆11Dec 1, 2025Updated 3 months ago
- JavaScript port of SymSpell for Node.js☆13Sep 30, 2022Updated 3 years ago
- A Vim plug-in that calculates the Flesch-Kincaid readability index per line.☆12Aug 17, 2020Updated 5 years ago
- RUSSE: Russian Semantic Evaluation.☆16Mar 1, 2022Updated 4 years ago
- Getting interpretable dimensions in word embedding spaces.☆15Jul 6, 2023Updated 2 years ago
- Code from http://www.ark.cs.cmu.edu/mheilman/questions/☆12Apr 23, 2013Updated 12 years ago
- KenLM extension for spaCy 2.0.☆16Dec 6, 2017Updated 8 years ago
- Preliminary spaCy models for Latin☆14Oct 20, 2022Updated 3 years ago
- ☆12May 18, 2022Updated 3 years ago
- List of corpora annotated for coreference for different languages☆17Aug 8, 2024Updated last year
- Yet Another Z39.50-powered Chatbot☆12Oct 9, 2023Updated 2 years ago
- Ad-hoc light weight SPARQL endpoint from a file, using Python Flask and RDFlib☆15Oct 24, 2016Updated 9 years ago
- Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.☆252Updated this week
- ☆18Jun 12, 2023Updated 2 years ago
- Repo contains Jupyter notebooks compiled during my review of the programming books listed.☆13Mar 9, 2022Updated 3 years ago
- MinScIE is an Open Information Extraction system which provides structured knowledge enriched with semantic information about citations.☆15Jun 9, 2019Updated 6 years ago
- CSV on the Web parser☆17Updated this week
- Datasets for the task of tracing diachronic semantic shifts in Russian for two large-scale time period pairs (from pre-Soviet to Soviet t…☆14Feb 21, 2025Updated last year