tatuylonen/wikitextprocessor

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/tatuylonen/wikitextprocessor)

tatuylonen / wikitextprocessor

Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. For data extraction, bulk syntax checking, error detection, and offline formatting.

☆115

Alternatives and similar repositories for wikitextprocessor

Users that are interested in wikitextprocessor are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tatuylonen / wiktextract
View on GitHub
Wiktionary dump file parser and multilingual data extractor
☆1,222Updated this week
wincent / wikitext
View on GitHub
🌐 Fast wikitext-to-HTML translator
☆41Dec 24, 2025Updated 7 months ago
Vuizur / ebook_dictionary_creator
View on GitHub
Code to create a database with cleaned up Wiktionary data and then to create ebook dictionaries based on this data.
☆36Aug 16, 2023Updated 2 years ago
DanielSWolf / wiki-pronunciation-dict
View on GitHub
Pronunciation dictionaries for several languages, based on Wiktionary data.
☆21Nov 28, 2021Updated 4 years ago
nclarius / pyPL
View on GitHub
Analytic tableau based minimal model generator, model checker and theorem prover for first-order logic with modal extensions
☆20Jul 5, 2026Updated 2 weeks ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
wswu / yawipa
View on GitHub
A comprehensive and extensible Wiktionary parsing framework.
☆25Sep 5, 2024Updated last year
earwig / mwparserfromhell
View on GitHub
A Python parser for MediaWiki wikicode
☆890Jun 12, 2026Updated last month
5j9 / wikitextparser
View on GitHub
A Python library to parse MediaWiki WikiText
☆327Updated this week
pymorphy2-fork / DAWG
View on GitHub
DAFSA-based dictionary-like read-only objects for Python. Based on `dawgdic` C++ library. Fork of https://github.com/pytries/DAWG
☆16Jul 1, 2026Updated 3 weeks ago
suyashb95 / WiktionaryParser
View on GitHub
A Python Wiktionary Parser
☆375Jul 23, 2025Updated last year
karlb / wikdict-web
View on GitHub
Web front end for WikDict dictionaries
☆20Jun 19, 2026Updated last month
salgo60 / Wikidata_riksdagen-corpus
View on GitHub
repository for matching Wikidata with riksdagen-corpus
☆14Nov 15, 2025Updated 8 months ago
iaramer / dobbi
View on GitHub
An open-source NLP library: fast text cleaning and preprocessing
☆23Nov 9, 2021Updated 4 years ago
abuccts / wikt2pron
View on GitHub
A Python toolkit converting pronunciation in enwiktionary xml dump to cmudict format
☆34Jul 5, 2019Updated 7 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
reynoldsnlp / udar
View on GitHub
UDAR Does Accented Russian: A finite-state morphological analyzer of Russian that handles stressed wordforms.
☆30Updated this week
jurta / nlp-rus-zaliz
View on GitHub
Processing the grammar dictionary of A. A. Zaliznyak for morphological inflection
☆19Jun 4, 2020Updated 6 years ago
dbklim / StressRNN
View on GitHub
Modified version of RusStress (https://github.com/MashaPo/russtress) — python package for placing stress in Russian text using RNN (BiLST…
☆45Aug 7, 2024Updated last year
kmike / russian-tagsets
View on GitHub
Russian morphological tagset converters library.
☆43Oct 4, 2019Updated 6 years ago
magnusmanske / papers
View on GitHub
☆12May 20, 2026Updated 2 months ago
JanWielemaker / chat80
View on GitHub
Classical CHAT80 NLP system for Prolog
☆28Feb 27, 2025Updated last year
elexis-eu / MWSA
View on GitHub
Datasets for the Monolingual Word Sense Alignment (MWSA) task
☆12Nov 10, 2020Updated 5 years ago
Nixinova / Wikity
View on GitHub
Compile wikitext to HTML: wikitext as a templating language.
☆16Feb 15, 2026Updated 5 months ago
internetarchive / iari
View on GitHub
Import workflows for the Wikipedia Citations Database
☆13Jun 23, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
filipinascimento / openalex-raw
View on GitHub
Tools to process OpenAlex raw snapshot files
☆12Mar 23, 2026Updated 4 months ago
epfl-dlab / WikiHist.html
View on GitHub
This is a repo containing all code and steps taken to download, setup the process and convert the whole English Wikipedia history from Wi…
☆14Jun 8, 2020Updated 6 years ago
xflr6 / features
View on GitHub
Feature set algebra for linguistics
☆17Jul 7, 2026Updated 2 weeks ago
EvaSeidlmayer / orcid-for-wikidata
View on GitHub
import information (affiliation, education) from ORCID database to Wikidata regarding authors of scientific papers
☆16May 25, 2023Updated 3 years ago
open-dict-data / wikidict-en
View on GitHub
Wikipedia Bilingual Reference Data (English)
☆18Jun 17, 2016Updated 10 years ago
daandouwe / ngram-lm
View on GitHub
A simple n-gram language model.
☆12Sep 11, 2018Updated 7 years ago
rycolab / aflt-f2022
View on GitHub
☆15Aug 17, 2022Updated 3 years ago
LBeaudoux / tatoebatools
View on GitHub
A library for fetching and reading Tatoeba's weekly exports
☆24Feb 5, 2026Updated 5 months ago
sherlok / sherlok-python
View on GitHub
Python client for Sherlok
☆14Jun 20, 2016Updated 10 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
natasha / nerus
View on GitHub
Large silver standart Russian corpus with NER, morphology and syntax markup
☆76Apr 13, 2026Updated 3 months ago
VeryBigSad / lizachatbot
View on GitHub
GPT-3 Chatbot with long-term memory and external sources. Original work & inspiration by @daveshap
☆17Jan 29, 2023Updated 3 years ago
hsajjad / ConceptX
View on GitHub
Analyzing Latent Concept in Pre-trained Transformer Models
☆12Jul 18, 2022Updated 4 years ago
calyxir / calyx-riscv
View on GitHub
RISCV Core written in Calyx
☆17Aug 16, 2024Updated last year
fayrose / MiddleEgyptianDictionaryWebsite
View on GitHub
A dictionary for Middle Egyptian hieroglyphics.
☆19Jan 21, 2026Updated 6 months ago
ssbc / private-group-spec
View on GitHub
☆15Nov 7, 2023Updated 2 years ago
izumi-h / ccgcomp
View on GitHub
Logical inference system based on event semantics and degree semantics in formal semantics
☆10Jan 22, 2023Updated 3 years ago