Wiktionary dump file parser and multilingual data extractor
☆1,108Feb 25, 2026Updated last week
Alternatives and similar repositories for wiktextract
Users that are interested in wiktextract are comparing it to the libraries listed below
Sorting:
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆107Feb 9, 2026Updated 3 weeks ago
- A Python Wiktionary Parser☆371Jul 23, 2025Updated 7 months ago
- A Python toolkit converting pronunciation in enwiktionary xml dump to cmudict format☆33Jul 5, 2019Updated 6 years ago
- Offline bilingual dictionaries made using data from Wiktionary☆62Apr 25, 2015Updated 10 years ago
- Machine-readable Wiktionary☆78May 6, 2024Updated last year
- Extract data from German Wiktionary XML files.☆26Jan 8, 2026Updated last month
- A comprehensive and extensible Wiktionary parsing framework.☆24Sep 5, 2024Updated last year
- The Open English WordNet☆734Feb 4, 2026Updated last month
- Tools for scraping, annotating, and parsing morphological information from Wiktionary☆15Oct 19, 2019Updated 6 years ago
- Massively multilingual pronunciation mining☆362Jan 13, 2026Updated last month
- A Python library to parse MediaWiki WikiText☆319May 15, 2025Updated 9 months ago
- Anki add-on to look up vocabulary using Wiktionary☆24Feb 25, 2025Updated last year
- Interactive visualization of Wiktionary words and etymologies.☆98Feb 20, 2026Updated last week
- Yomitan-compatible dictionaries from wikitionary data☆154Feb 4, 2026Updated last month
- Java Wiktionary Library☆60Nov 19, 2022Updated 3 years ago
- Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)☆34Jun 29, 2025Updated 8 months ago
- [LREC 2020] EtymDB, an Etymological DataBase (v2.1)☆26Jan 4, 2022Updated 4 years ago
- LingPy: Python library for quantitative tasks in historical linguistics☆140Dec 6, 2025Updated 2 months ago
- Combining encoder-based language models☆11Nov 11, 2021Updated 4 years ago
- A corpus of diacritized Hebrew texts (טקסט מנוקד)☆11May 4, 2022Updated 3 years ago
- Monolingual wordlists with pronunciation information in IPA☆732May 24, 2025Updated 9 months ago
- Python package and data files for manipulating phonological segments (phones, phonemes) in terms of universal phonological features.☆295Oct 22, 2025Updated 4 months ago
- A Python parser for MediaWiki wikicode☆862Jul 1, 2025Updated 8 months ago
- Imports Wiktionary's grammatical data into Wikidata☆18Jan 11, 2020Updated 6 years ago
- Simple sentence mining tool for language learning☆509Aug 15, 2025Updated 6 months ago
- Processing the grammar dictionary of A. A. Zaliznyak for morphological inflection☆19Jun 4, 2020Updated 5 years ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆186Jun 6, 2025Updated 8 months ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆160Jun 18, 2024Updated last year
- Ebook reader dictionaries extracted from Wiktionary in almost all languages, in Stardict, Tabfile and Kindle format☆133May 19, 2023Updated 2 years ago
- Repository for Frequency Word List Generator and processed files☆1,450Feb 7, 2022Updated 4 years ago
- A parser and autocorrection tool for wiktionary.☆39Dec 4, 2015Updated 10 years ago
- This packages up data for the Open Multilingual Wordnet☆64Feb 1, 2026Updated last month
- A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)☆801Dec 24, 2025Updated 2 months ago
- ☆16Jan 20, 2022Updated 4 years ago
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆55Apr 2, 2023Updated 2 years ago
- python package russtress accentuates russian text☆63May 13, 2020Updated 5 years ago
- IPA Pronunciation Dictionaries in DSL format☆44Jan 13, 2017Updated 9 years ago
- Data and scripts for the proper evaluation of cross-lingual embeddings in multiple languages☆15Apr 11, 2020Updated 5 years ago
- A German hover dictionary. It's a modified version of Yomichan that works with German.☆31Oct 31, 2023Updated 2 years ago