lehinevych / MediaWikiAPI
Python wrapper for the MediaWiki API to access and parse data from Wikipedia
☆39Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for MediaWikiAPI
- Binary Python bindings for poppler utils for content extraction☆42Updated 3 years ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆42Updated 5 years ago
- Parse numbers written in natural language☆109Updated 3 weeks ago
- an experimental implementation of Burrow's delta in Python 3☆20Updated 3 years ago
- 📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF☆38Updated 2 years ago
- Alternative robots parser module for Python☆16Updated 3 weeks ago
- Atom, RSS and JSON feed parser for Python 3☆115Updated 2 years ago
- A helper library full of URL-related heuristics.☆64Updated last month
- Python wrapper for Ferret☆42Updated 2 years ago
- List of all countries with names and ISO 3166-1 codes in all languages.☆26Updated 3 weeks ago
- Sort-friendly URI Reordering Transform (SURT) python module☆40Updated 3 months ago
- Utility library to turn country names into ISO two-letter codes☆66Updated this week
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆37Updated 5 years ago
- Python based Wikidata framework for easy dataframe extraction☆39Updated 11 months ago
- Named entity recognition for the legal domain☆40Updated 3 years ago
- A Python library for defining rule-based overrides on messy data☆12Updated this week
- Language detection using Spacy and Fasttext☆54Updated 11 months ago
- Fast and robust date extraction from web pages, with Python or on the command-line☆122Updated last week
- A tiny library for Python text normalisation. Useful for ad-hoc text processing.☆144Updated 10 months ago
- 🧬 A VS Code extension for annotating data with Prodigy☆30Updated 2 years ago
- Python port for IWNLP.Lemmatizer☆17Updated last year
- Extract networks of entities from journalistic reporting☆47Updated last year
- Generate reports for spaCy models.☆28Updated 2 years ago
- A Python implementation of Lunr.js 🌖☆189Updated 3 weeks ago
- A set of utilities for processing MediaWiki XML dump data.☆45Updated 3 months ago
- International Address formatter which considers the standard formatting rules of the country☆26Updated 3 years ago
- 🌸 Train floret vectors☆18Updated last year
- Python 3 library for reading and writing warc files☆21Updated 6 years ago
- Commons of stupid, simple Python micro functions. Pull requests very welcome.☆17Updated 2 years ago