siznax / wptoolsLinks
Wikipedia tools (for Humans): easily extract data from Wikipedia, Wikidata, and other MediaWikis
☆585Updated last year
Alternatives and similar repositories for wptools
Users that are interested in wptools are comparing it to the libraries listed below
Sorting:
- Wikidata client library for Python☆354Updated 10 months ago
- read and edit a Wikibase instance from the command line☆231Updated 2 weeks ago
- Fact Extraction from Wikipedia Text☆535Updated 9 years ago
- spacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface☆256Updated 9 months ago
- Entity linking system for Wikidata updated by your edits in real time☆254Updated 6 months ago
- A Python parser for MediaWiki wikicode☆797Updated last month
- A Wikidata Python module integrating the MediaWiki API and the Wikidata SPARQL endpoint☆256Updated last year
- MediaWiki API wrapper in python http://pymediawiki.readthedocs.io/en/latest/☆183Updated 4 months ago
- Textpipe: clean and extract metadata from text☆301Updated 3 years ago
- Filter and format a newline-delimited JSON stream of Wikibase entities☆97Updated 7 months ago
- Python tools for interacting with Wikidata☆152Updated last year
- Streaming WARC/ARC library for fast web archive IO☆415Updated 5 months ago
- Quickly extract multi-word phrases from a corpus☆191Updated 4 years ago
- A Python library that interfaces with the MediaWiki API. This is a mirror from gerrit.wikimedia.org. Do not submit any patches here. See …☆680Updated this week
- Process Common Crawl data with Python and Spark☆431Updated last week
- Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.☆630Updated 3 years ago
- Heuristic based boilerplate removal tool☆780Updated 3 months ago
- Text Mining and Topic Modeling Toolkit for Python with parallel processing power☆190Updated 2 years ago
- 📖 Library that provides ways to read from and iterate through the Wikibase entities in a Wikibase Repository JSON dump☆74Updated 10 months ago
- A Python function to break down hashtags or compound words created by putting together multiple words☆34Updated 9 years ago
- command-line tool to extract taxonomies from Wikidata☆126Updated 5 years ago
- 💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy☆733Updated 9 months ago
- A set of utility scripts to process Wikipedia related data☆38Updated 2 years ago
- Tools for parsing and querying Wikimedia Foundation pageview data from both static dumps and the online API.☆65Updated 3 years ago
- 💙 Emoji handling and meta data for spaCy with custom extension attributes☆181Updated 2 years ago
- geoparsepy is a Python geoparsing library that will extract and disambiguate locations from text. It uses a local OpenStreetMap database …☆63Updated 3 years ago
- Full text geoparsing as a Python library☆751Updated 3 years ago
- A spaCy pipeline and model for NLP on unstructured legal text.☆651Updated 10 months ago
- Mapping Wikipedia pages to Wikidata IDs and vice versa.☆160Updated 2 years ago
- Twitter NLP Tools☆889Updated 2 years ago