siznax / wptools
Wikipedia tools (for Humans): easily extract data from Wikipedia, Wikidata, and other MediaWikis
☆585Updated last year
Alternatives and similar repositories for wptools:
Users that are interested in wptools are comparing it to the libraries listed below
- Wikidata client library for Python☆355Updated 9 months ago
- A Python library that interfaces with the MediaWiki API. This is a mirror from gerrit.wikimedia.org. Do not submit any patches here. See …☆670Updated this week
- A Python parser for MediaWiki wikicode☆790Updated last month
- Tools for parsing and querying Wikimedia Foundation pageview data from both static dumps and the online API.☆65Updated 3 years ago
- Collection of tools for building diachronic/historical word vectors☆431Updated last year
- Python client library to interface with the MediaWiki API☆326Updated last month
- Python tools for interacting with Wikidata☆153Updated last year
- Entity linking system for Wikidata updated by your edits in real time☆254Updated 5 months ago
- A Python library to parse MediaWiki WikiText☆307Updated 6 months ago
- The software used to extract structured data from Wikipedia☆895Updated 2 months ago
- A Wikidata Python module integrating the MediaWiki API and the Wikidata SPARQL endpoint☆255Updated last year
- PYthon Automated Term Extraction☆311Updated 2 years ago
- A machine learning tool for fishing entities☆264Updated last month
- Python wrapper for Wikipedia☆676Updated last week
- read and edit a Wikibase instance from the command line☆230Updated 2 months ago
- Textpipe: clean and extract metadata from text☆301Updated 3 years ago
- Guidelines.☆96Updated 9 months ago
- Filter and format a newline-delimited JSON stream of Wikibase entities☆97Updated 6 months ago
- Examples for using the dedupe library☆411Updated 8 months ago
- a python library for parsing unstructured western names into name components.☆606Updated 6 months ago
- spacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface☆255Updated 8 months ago
- CLI for loading Wikidata subsets (or all of it) into Elasticsearch☆70Updated 3 years ago
- Geotext extracts country and city mentions from text☆139Updated 2 years ago
- Quickly extract multi-word phrases from a corpus☆191Updated 4 years ago
- MediaWiki API wrapper in python http://pymediawiki.readthedocs.io/en/latest/☆183Updated 3 months ago
- KnowledgeNet: A Benchmark Dataset for Knowledge Base Population☆268Updated 3 years ago
- Python library for reading and writing warc files☆240Updated 3 years ago
- A set of utility scripts to process Wikipedia related data☆38Updated 2 years ago
- A python utility for downloading Common Crawl data☆237Updated last year
- Streaming WARC/ARC library for fast web archive IO☆411Updated 4 months ago