michmech/lemmatization-lists

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/michmech/lemmatization-lists)

michmech / lemmatization-lists

Machine-readable lists of lemma-token pairs in 23 languages.

☆365

Alternatives and similar repositories for lemmatization-lists

Users that are interested in lemmatization-lists are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

aaaton / golem
View on GitHub
A lemmatizer implemented in Go
☆98May 9, 2025Updated last year
adbar / simplemma
View on GitHub
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
☆209Updated this week
explosion / spacy-lookups-data
View on GitHub
📂 Additional lookup tables and data resources for spaCy
☆116Jun 4, 2025Updated last year
DuyguA / german-morph-dictionaries
View on GitHub
Morphological Dictionaries for German Language
☆32Apr 29, 2026Updated 2 months ago
michmech / Gramadan
View on GitHub
Gramadán: a computational grammar of Irish
☆17Jan 23, 2023Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
hermitdave / FrequencyWords
View on GitHub
Repository for Frequency Word List Generator and processed files
☆1,520Feb 7, 2022Updated 4 years ago
bjascob / LemmInflect
View on GitHub
A python module for English lemmatization and inflection.
☆280Sep 14, 2023Updated 2 years ago
juanalonso / nomenclator
View on GitHub
Un generador de nombres de poblaciones usando una red neuronal LSTM
☆14Mar 24, 2023Updated 3 years ago
lenakmeth / Wikinflection-Corpus
View on GitHub
The Wikinflection Corpus, from the paper "Wikinflection Corpus: A (Better) Multilingual, Morpheme-Annotated Inflectional Corpus" (Metheni…
☆12Dec 15, 2023Updated 2 years ago
Expertium / expertium.github.io
View on GitHub
☆14Mar 30, 2026Updated 3 months ago
argilla-io / spacy-wordnet
View on GitHub
spacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface
☆261Aug 21, 2025Updated 11 months ago
KaiQiangSong / joint_parse_summ
View on GitHub
(AAAI'20) The source code for the paper "Joint Parsing and Generation for Abstractive Summarization".
☆24Apr 22, 2020Updated 6 years ago
FinNLP / fin
View on GitHub
🚀 Node.js Natural Language Processor written in TypeScript
☆46Feb 15, 2019Updated 7 years ago
xiety / AnkiHistoryVisualization
View on GitHub
Anki History Visualization
☆19Nov 28, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
DiceTechJobs / RelevancyFeedback
View on GitHub
Dice.com's relevancy feedback solr plugin created by Simon Hughes (Dice). Contains request handlers for doing MLT style recommendations, …
☆23May 12, 2021Updated 5 years ago
richardwilly98 / elasticsearch-opennlp-auto-tagging
View on GitHub
Auto tagging with OpenNPL
☆15Nov 20, 2013Updated 12 years ago
dluman / rusTy
View on GitHub
Rust bindings for the spaCy library.
☆24Dec 11, 2022Updated 3 years ago
dav009 / awesome-spanish-nlp
View on GitHub
Curated list of Linguistic Resources for doing NLP & CL on Spanish
☆351Jan 9, 2024Updated 2 years ago
ninja33 / mdx-server
View on GitHub
a service to read mdx/mdd file and provide http interface
☆261Jul 2, 2021Updated 5 years ago
KaniyamFoundation / all_tamil_nouns
View on GitHub
A project to collect all tamil nouns
☆12Dec 14, 2024Updated last year
tatuylonen / wiktextract
View on GitHub
Wiktionary dump file parser and multilingual data extractor
☆1,222Updated this week
aholab / AhoTTS
View on GitHub
Text-to-Speech conversor for Basque and Spanish. It includes linguistic processing and built voices for the languages aforementioned. Its…
☆18Jan 15, 2026Updated 6 months ago
jzohrab / pact
View on GitHub
Python GUI tool for language learning: create clips from mp3 files, add transcription via Vosk AI, and export to Anki
☆22Mar 7, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
noword / MdxConverter
View on GitHub
generate a html or pdf or jpg file for specific words through a mdx dirctionary
☆41Dec 11, 2023Updated 2 years ago
tatuylonen / wikitextprocessor
View on GitHub
Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…
☆115Updated this week
diyclassics / latin-spacy-models
View on GitHub
Preliminary spaCy models for Latin
☆14Oct 20, 2022Updated 3 years ago
cltl / SpaCy-to-NAF
View on GitHub
spaCy-to-naf converter
☆21Jun 10, 2025Updated last year
abuccts / wikt2pron
View on GitHub
A Python toolkit converting pronunciation in enwiktionary xml dump to cmudict format
☆34Jul 5, 2019Updated 7 years ago
originell / smaz-py3
View on GitHub
Small string compression using smaz compression algorithm. Fast, because it's in C. Supports Python 3+
☆13Oct 18, 2025Updated 9 months ago
LBeaudoux / tatoebatools
View on GitHub
A library for fetching and reading Tatoeba's weekly exports
☆24Feb 5, 2026Updated 5 months ago
Kozea / Pyphen
View on GitHub
Hy-phen-ation made easy
☆230Jun 19, 2026Updated last month
xtannier / WebAnnotator
View on GitHub
WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…
☆48Dec 17, 2021Updated 4 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
cisnlp / simalign
View on GitHub
[EMNLP 2020] Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)
☆398Nov 7, 2023Updated 2 years ago
antonisa / embeddings
View on GitHub
Data and scripts for the proper evaluation of cross-lingual embeddings in multiple languages
☆15Apr 11, 2020Updated 6 years ago
moos / wordpos-web
View on GitHub
wordpos for the web/browser
☆43May 7, 2021Updated 5 years ago
michmech / irish-word-frequency
View on GitHub
About 6,500 Irish lemmas ordered by corpus frequency, with noise removed.
☆37May 11, 2018Updated 8 years ago
mfaruqui / morph-trans
View on GitHub
Code for morphological transformations
☆29Jun 3, 2017Updated 9 years ago
ines / spacy-js
View on GitHub
🎀 JavaScript API for spaCy with Python REST API
☆202Sep 16, 2023Updated 2 years ago
rspeer / wordfreq
View on GitHub
Access a database of word frequencies, in various natural languages.
☆1,710Jan 4, 2025Updated last year