tatuylonen/wiktextract

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/tatuylonen/wiktextract)

tatuylonen / wiktextract

Wiktionary dump file parser and multilingual data extractor

☆1,230

Alternatives and similar repositories for wiktextract

Users that are interested in wiktextract are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tatuylonen / wikitextprocessor
View on GitHub
Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…
☆115Jul 20, 2026Updated last week
suyashb95 / WiktionaryParser
View on GitHub
A Python Wiktionary Parser
☆376Jul 23, 2025Updated last year
abuccts / wikt2pron
View on GitHub
A Python toolkit converting pronunciation in enwiktionary xml dump to cmudict format
☆34Jul 5, 2019Updated 7 years ago
gambolputty / wiktionary-de-parser
View on GitHub
Extract data from German Wiktionary XML files.
☆26May 29, 2026Updated 2 months ago
componavt / wikokit
View on GitHub
Machine-readable Wiktionary
☆79May 6, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Vuizur / ebook_dictionary_creator
View on GitHub
Code to create a database with cleaned up Wiktionary data and then to create ebook dictionaries based on this data.
☆36Aug 16, 2023Updated 2 years ago
xinjli / phonepiece
View on GitHub
phone inventory library
☆17May 15, 2023Updated 3 years ago
globalwordnet / english-wordnet
View on GitHub
The Open English WordNet
☆838Updated this week
clefourrier / EtymDB
View on GitHub
[LREC 2020] EtymDB, an Etymological DataBase (v2.1)
☆28Jan 4, 2022Updated 4 years ago
yomidevs / wiktionary-to-yomitan
View on GitHub
Yomitan-compatible dictionaries from wikitionary data
☆195Jul 15, 2026Updated 2 weeks ago
juditacs / wikt2dict
View on GitHub
Wiktionary parser tool for many language editions.
☆54Aug 17, 2022Updated 3 years ago
gambolputty / german-nouns
View on GitHub
A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the dat…
☆175Dec 29, 2024Updated last year
benreynwar / wiktionary-parser
View on GitHub
A parser and autocorrection tool for wiktionary.
☆39Dec 4, 2015Updated 10 years ago
abdnh / anki-wiktionary
View on GitHub
Anki add-on to look up vocabulary using Wiktionary
☆27Jun 1, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
5j9 / wikitextparser
View on GitHub
A Python library to parse MediaWiki WikiText
☆327Jul 18, 2026Updated last week
unimorph / wiktionary-tools
View on GitHub
Tools for scraping, annotating, and parsing morphological information from Wiktionary
☆15Oct 19, 2019Updated 6 years ago
CUNY-CL / wikipron
View on GitHub
Massively multilingual pronunciation mining
☆371Updated this week
remusao / wgraph
View on GitHub
Etymological graphs based on Wiktionary dumps
☆26Mar 6, 2025Updated last year
rspeer / wordfreq
View on GitHub
Access a database of word frequencies, in various natural languages.
☆1,714Jan 4, 2025Updated last year
adbar / simplemma
View on GitHub
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
☆210Updated this week
DanielSWolf / wiki-pronunciation-dict
View on GitHub
Pronunciation dictionaries for several languages, based on Wiktionary data.
☆21Nov 28, 2021Updated 4 years ago
lingpy / lingpy
View on GitHub
LingPy: Python library for quantitative tasks in historical linguistics
☆145May 27, 2026Updated 2 months ago
seth-js / yomichan-de
View on GitHub
A German hover dictionary. It's a modified version of Yomichan that works with German.
☆34Oct 31, 2023Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
FreeLanguageTools / vocabsieve
View on GitHub
Simple sentence mining tool for language learning
☆528Aug 15, 2025Updated 11 months ago
hermitdave / FrequencyWords
View on GitHub
Repository for Frequency Word List Generator and processed files
☆1,523Feb 7, 2022Updated 4 years ago
dmort27 / panphon
View on GitHub
Python package and data files for manipulating phonological segments (phones, phonemes) in terms of universal phonological features.
☆320Oct 22, 2025Updated 9 months ago
dan1wang / jsonbook-builder
View on GitHub
Wikitionary in accessible JSON format
☆35Dec 3, 2022Updated 3 years ago
epistularum / hunspell-ja-deinflection
View on GitHub
Hunspell dictionary to deinflect all Japanese conjugated verbs to the dictionary form and suggest correct spelling.
☆14Sep 12, 2022Updated 3 years ago
Vuizur / Wiktionary-Dictionaries
View on GitHub
Ebook reader dictionaries extracted from Wiktionary in almost all languages, in Stardict, Tabfile and Kindle format
☆174May 19, 2023Updated 3 years ago
earwig / mwparserfromhell
View on GitHub
A Python parser for MediaWiki wikicode
☆892Updated this week
kbatsuren / MorphyNet
View on GitHub
MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)
☆59Apr 2, 2023Updated 3 years ago
open-dict-data / ipa-dict
View on GitHub
Monolingual wordlists with pronunciation information in IPA
☆787May 24, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
kbatsuren / wiktra
View on GitHub
Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)
☆37Jun 29, 2025Updated last year
freedict / fd-dictionaries
View on GitHub
hand-written dictionaries from the FreeDict project
☆479Jul 5, 2026Updated 3 weeks ago
korhoj / wiktionary-convert-no-db
View on GitHub
Convert Wiktionary entries to various formats such as StarDict or DB (MariaDB/MySQL). I'm dropping the database support for this new main…
☆17Oct 5, 2025Updated 9 months ago
stefan-it / ukrainian-electra
View on GitHub
Ukrainian ELECTRA model
☆12Mar 11, 2023Updated 3 years ago
omwn / omw-data
View on GitHub
This packages up data for the Open Multilingual Wordnet
☆69Mar 28, 2026Updated 4 months ago
zifeo / Etymap
View on GitHub
Interactive visualization of Wiktionary words and etymologies.
☆102Updated this week
open-dsl-dict / ipa-dict-dsl
View on GitHub
IPA Pronunciation Dictionaries in DSL format
☆44Jan 13, 2017Updated 9 years ago