adbar/simplemma

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/adbar/simplemma)

adbar / simplemma

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

☆209

Alternatives and similar repositories for simplemma

Users that are interested in simplemma are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Proteusiq / luga
View on GitHub
Blazing fast language detection using fastText model
☆24Dec 18, 2022Updated 3 years ago
pymorphy2-fork / pymorphy2
View on GitHub
Morphological analyzer / inflection engine for Russian and Ukrainian languages. Fork of https://github.com/pymorphy2/pymorphy2
☆11Jul 1, 2026Updated 3 weeks ago
pemistahl / lingua-py
View on GitHub
The most accurate natural language detection library for Python, suitable for short text and mixed-language text
☆1,763Updated this week
adbar / htmldate
View on GitHub
Fast and robust date extraction from web pages, with Python or on the command-line
☆154Updated this week
anyks / asc
View on GitHub
ANYKS Spell-Checker
☆19Jan 3, 2023Updated 3 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
gambolputty / newscorpus
View on GitHub
A Python scraping module, that extracts text from articles found in RSS feeds. Uses SQLite as database.
☆20Jul 5, 2024Updated 2 years ago
steinst / SentAlign
View on GitHub
☆38Mar 16, 2026Updated 4 months ago
nsu-ai-team / voxforge_ru_sphinx_experiments
View on GitHub
Database for experiments with russian voxforge audio data (http://voxforge.org/ru/downloads).
☆14Aug 31, 2021Updated 4 years ago
KorAP / Krill
View on GitHub
A Corpus Data Retrieval Index using Lucene for Look-Ups
☆20Updated this week
slub / docsa
View on GitHub
SLUB Document Classification and Similarity Analysis
☆10Aug 31, 2023Updated 2 years ago
explosion / spacy-lookups-data
View on GitHub
📂 Additional lookup tables and data resources for spaCy
☆116Jun 4, 2025Updated last year
liao961120 / concordancer
View on GitHub
Searching in-memory corpus with Corpus Query Language (CQL)
☆19Dec 2, 2024Updated last year
nipunsadvilkar / pySBD
View on GitHub
🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.
☆925Aug 20, 2024Updated last year
NatLibFi / Annif
View on GitHub
Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
☆264Jul 1, 2026Updated 3 weeks ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
nlpub / russe
View on GitHub
RUSSE: Russian Semantic Evaluation.
☆15Mar 1, 2022Updated 4 years ago
pd3f / dehyphen
View on GitHub
📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF
☆39Mar 8, 2022Updated 4 years ago
superlinear-ai / wtpsplit-lite
View on GitHub
✂️ Sentence segmentation with wtpsplit's state-of-the-art Segment any Text (SaT) models
☆39May 2, 2026Updated 2 months ago
DoodleBears / split-lang
View on GitHub
✨ Split text by languages (e.g. 你喜欢看アニメ吗 -> 你喜欢看 | アニメ | 吗) for NLP tasks (e.g. parse, TTS). Powered by fasttext and budoux
☆74Sep 18, 2025Updated 10 months ago
ddelange / retrie
View on GitHub
Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing
☆76Jul 1, 2026Updated 3 weeks ago
daac-tools / crawdad
View on GitHub
🦞 Rust library of natural language dictionaries using character-wise double-array tries.
☆38Jan 13, 2025Updated last year
okfde / froide-govplan
View on GitHub
Basis of FragDenStaat.de's „Koalitionstracker“
☆15Jul 14, 2025Updated last year
gandersen101 / spaczz
View on GitHub
Fuzzy matching and more functionality for spaCy.
☆258Jul 6, 2024Updated 2 years ago
Liebeck / spacy-iwnlp
View on GitHub
German lemmatization with IWNLP as extension for spaCy
☆27Apr 13, 2026Updated 3 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
adbar / trafilatura
View on GitHub
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XM…
☆6,326Updated this week
SkBlaz / rakun2
View on GitHub
RaKUn 2.0 - A fast keyword detection algorithm
☆73Aug 5, 2025Updated 11 months ago
nareike / adhs
View on GitHub
Ad-hoc light weight SPARQL endpoint from a file, using Python Flask and RDFlib
☆15Oct 24, 2016Updated 9 years ago
diyclassics / latin-spacy-models
View on GitHub
Preliminary spaCy models for Latin
☆14Oct 20, 2022Updated 3 years ago
originell / smaz-py3
View on GitHub
Small string compression using smaz compression algorithm. Fast, because it's in C. Supports Python 3+
☆13Oct 18, 2025Updated 9 months ago
stefan-it / italian-bertelectra
View on GitHub
🇮🇹 Italian BERT and ELECTRA models (incl. evaluation)
☆18Oct 20, 2022Updated 3 years ago
bsolomon1124 / pycld3
View on GitHub
Python3 bindings for the Compact Language Detector v3 (CLD3)
☆154Apr 19, 2026Updated 3 months ago
xnliang98 / CKE-ZH
View on GitHub
基于中心度的中文关键短语抽取工具
☆11Sep 2, 2022Updated 3 years ago
rsling / texrex
View on GitHub
texrex web page cleaning & ClaraX random walk crawler
☆11Dec 13, 2021Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Helsinki-NLP / OpusFilter
View on GitHub
OpusFilter - Parallel corpus processing toolkit
☆115Jul 1, 2026Updated 3 weeks ago
Erotemic / xdev
View on GitHub
An excellent developer tool for excellent developers
☆13Jul 10, 2026Updated last week
rdf-ext / rdf-parser-csvw
View on GitHub
CSV on the Web parser
☆17Updated this week
explosion / floret
View on GitHub
🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy
☆343Apr 25, 2025Updated last year
drkane / datasette-reconcile
View on GitHub
Adds a reconciliation API endpoint to Datasette, based on the Reconciliation Service API specification.
☆24Feb 2, 2024Updated 2 years ago
ai-forever / augmentex
View on GitHub
Augmentex — a library for augmenting texts with errors
☆69Jul 3, 2024Updated 2 years ago
chatnoir-eu / web-content-extraction-benchmark
View on GitHub
Web Content Extraction Benchmark
☆27Dec 16, 2025Updated 7 months ago