5j9/wikitextparser

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/5j9/wikitextparser)

5j9 / wikitextparser

A Python library to parse MediaWiki WikiText

☆327

Alternatives and similar repositories for wikitextparser

Users that are interested in wikitextparser are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

earwig / mwparserfromhell
View on GitHub
A Python parser for MediaWiki wikicode
☆890Jun 12, 2026Updated last month
tatuylonen / wikitextprocessor
View on GitHub
Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…
☆115Updated this week
dgilman / py-wikimarkup
View on GitHub
A MediaWiki-to-HTML parser for Python. Improved for Kitsune.
☆11Jan 26, 2023Updated 3 years ago
microsoft / FoundationModels
View on GitHub
☆13Aug 20, 2021Updated 4 years ago
tatuylonen / wiktextract
View on GitHub
Wiktionary dump file parser and multilingual data extractor
☆1,222Updated this week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
WikiExtractor / wikiextractor
View on GitHub
A tool for extracting plain text from Wikipedia dumps
☆3,996Updated this week
ermongroup / bgm
View on GitHub
Code for "Boosted Generative Models", AAAI 2018.
☆20Dec 26, 2017Updated 8 years ago
spencermountain / wtf_wikipedia
View on GitHub
a pretty-committed wikipedia markup parser
☆851Jul 12, 2026Updated last week
lingua-libre / RecordWizard
View on GitHub
🌻 MediaWiki extension allowing mass recording of clean, well cut, well named pronunciation files.
☆17Updated this week
robinkrahl / LrMediaWiki
View on GitHub
MediaWiki for Lightroom
☆13Jan 8, 2022Updated 4 years ago
greyside / django-admin-smoke-tests
View on GitHub
Runs some basic tests on your custom admin objects.
☆14Jun 19, 2024Updated 2 years ago
filipinascimento / openalex-raw
View on GitHub
Tools to process OpenAlex raw snapshot files
☆12Mar 23, 2026Updated 4 months ago
bhsd-harry / wikiparser-node
View on GitHub
A Node.js/browser parser for MediaWiki markup with AST
☆45Updated this week
Grasia / wiki-scripts
View on GitHub
Miscellaneous scripts to gather and process data of wikis.
☆20Apr 20, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
nelson-liu / lexical-semantic-recognition
View on GitHub
☆18Jun 12, 2023Updated 3 years ago
shigapov / wikibase-knowledge-graphs
View on GitHub
A collection of open source tools and resources related to Wikibase knowledge graphs
☆75Sep 9, 2025Updated 10 months ago
fnielsen / wembedder
View on GitHub
Wikidata embedding
☆51Nov 5, 2024Updated last year
machelreid / m2d2
View on GitHub
M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer
☆54Nov 21, 2022Updated 3 years ago
berlino / weaksp_em19
View on GitHub
Learning Semantic Parsers from Denotations with Latent Structured Alignments and Abstract Programs(EMNLP2019)
☆19Dec 3, 2019Updated 6 years ago
suyashb95 / WiktionaryParser
View on GitHub
A Python Wiktionary Parser
☆375Jul 23, 2025Updated last year
lucaswerkmeister / cookiecutter-toolforge
View on GitHub
cookiecutter template for Wikimedia Toolforge tools using Flask
☆25Nov 19, 2025Updated 8 months ago
fxamacker / circlehash
View on GitHub
CircleHash is a family of fast hashes -- CircleHash64f is ideal for short inputs, reaching 10GB/s starting at <64 bytes and 15GB/s at 256…
☆24Jul 8, 2026Updated 2 weeks ago
ramtinms / tokenquery
View on GitHub
TokenQuery (regular expressions over tokens)
☆28Mar 1, 2017Updated 9 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
shyamupa / xelms
View on GitHub
☆19Dec 19, 2018Updated 7 years ago
jpbruinsslot / warc3
View on GitHub
Python 3 library for reading and writing warc files
☆21Jan 29, 2018Updated 8 years ago
SuLab / sparql_to_pandas
View on GitHub
Example for accessing SPARQL endpoints in Python with Pandas
☆13Aug 23, 2022Updated 3 years ago
maxdotio / mighty-batch
View on GitHub
Highly concurrent and fast content processing for Mighty Inference Server
☆10Feb 6, 2023Updated 3 years ago
Transfusion / cjkvi-ids-unicode
View on GitHub
Unicode-only CJKV IDS data
☆14Aug 9, 2024Updated last year
KTH-Library / openalex
View on GitHub
R package to provide data access to OpenAlex by way of REST API
☆12Jan 12, 2026Updated 6 months ago
JunShern / few-shot-adaptation
View on GitHub
Exploring Few-Shot Adaptation of Language Models with Tables
☆25Aug 22, 2022Updated 3 years ago
Yale-LILY / dart
View on GitHub
Dataset for NAACL 2021 paper: "DART: Open-Domain Structured Data Record to Text Generation"
☆158Nov 21, 2022Updated 3 years ago
shyamupa / xling-el
View on GitHub
pytorch model for cross-lingual entity linking.
☆16Mar 13, 2019Updated 7 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
bennofs / wdumper
View on GitHub
Tool for generating filtered Wikidata RDF exports
☆46Apr 9, 2022Updated 4 years ago
addshore / addwiki
View on GitHub
libraries, packages and applications for use with MediaWiki, Wikipedia, Wikibase and Wikidata in PHP
☆17Updated this week
jeromecc / awesome-health
View on GitHub
A curated list of awesome open health software, libraries, tools and resources.
☆11Apr 13, 2017Updated 9 years ago
phucty / mtab_tool
View on GitHub
MTab: Entity Search and Table Annotation with Wikidata, Wikipedia, and DBpedia
☆32May 30, 2022Updated 4 years ago
pytest-dev / pytest-plus
View on GitHub
pytest-plus adds new features to pytest
☆12Oct 27, 2025Updated 8 months ago
goldsmith / Wikipedia
View on GitHub
A Pythonic wrapper for the Wikipedia API
☆2,997May 12, 2024Updated 2 years ago
wikipedia2vec / wikipedia2vec
View on GitHub
A tool for learning vector representations of words and entities from Wikipedia
☆967May 3, 2024Updated 2 years ago