lehinevych / MediaWikiAPILinks
Python wrapper for the MediaWiki API to access and parse data from Wikipedia
☆42Updated 2 months ago
Alternatives and similar repositories for MediaWikiAPI
Users that are interested in MediaWikiAPI are comparing it to the libraries listed below
Sorting:
- A helper library full of URL-related heuristics.☆73Updated last month
- Next-generation Punkt sentence boundary detection with zero dependencies☆20Updated 2 months ago
- A Python implementation of Lunr.js 🌖☆200Updated 7 months ago
- MediaWiki API wrapper in python http://pymediawiki.readthedocs.io/en/latest/☆186Updated last month
- Accurately find/replace/remove emojis in text strings☆162Updated last year
- 🌸 Train floret vectors☆18Updated 2 years ago
- A python package to simulate typographical errors.☆38Updated last year
- Fast and robust date extraction from web pages, with Python or on the command-line☆141Updated 3 months ago
- 🕊️ Radically lightweight command-line interfaces☆109Updated last month
- Extract text from HTML☆134Updated 5 years ago
- Python based Wikidata framework for easy dataframe extraction☆45Updated last year
- Python Simple Object Storage - provides a list and dictionary interface that seamlessly stores data in a file, like a simplified database…☆58Updated 2 years ago
- A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any othe…☆68Updated 3 years ago
- A set of utilities for processing MediaWiki XML dump data.☆57Updated 8 months ago
- Datasette plugin providing instructions for exporting data to Jupyter or Observable☆13Updated 2 years ago
- 📂 Additional lookup tables and data resources for spaCy☆112Updated 4 months ago
- A Python module to discover the etymology of words☆151Updated last year
- 📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF☆39Updated 3 years ago
- Utilize your personal data like Google!☆160Updated 2 years ago
- Parse government documents into well formed JSON☆73Updated 2 months ago
- Binary Python bindings for poppler utils for content extraction☆42Updated 4 years ago
- 🧬 A VS Code extension for annotating data with Prodigy☆30Updated 3 years ago
- Language detection using Spacy and Fasttext☆57Updated last year
- Alternative robots parser module for Python☆20Updated last month
- Atom, RSS and JSON feed parser for Python 3☆117Updated 3 years ago
- python functions for applied use of schema.org☆36Updated 3 years ago
- URL normalization for Python☆99Updated 6 months ago
- Now included in rigour☆152Updated last month
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆149Updated 10 months ago
- A tiny library for Python text normalisation. Useful for ad-hoc text processing.☆155Updated last month