edgi-govdata-archiving / wayback
A Python API to the Internet Archive Wayback Machine
☆67Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for wayback
- Wayback Machine API interface & a command-line tool☆484Updated 9 months ago
- Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.☆60Updated this week
- A helper library full of URL-related heuristics.☆64Updated last month
- A simple Python wrapper and command-line interface for archive.org’s "Save Page Now" capturing service☆168Updated last month
- The little things give you away... A collection of various small helper stuff – Mirror repo only, no longer kept in sync, refer to gitea.…☆23Updated 4 years ago
- Alternative robots parser module for Python☆16Updated last month
- Inspect a URL and estimate if it contains a news story☆39Updated this week
- A financial disclosure data extraction tool.☆13Updated last year
- Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.☆123Updated 7 months ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆33Updated 2 weeks ago
- Write Datasette canned queries as plain SQL files☆13Updated 2 years ago
- Fast and robust date extraction from web pages, with Python or on the command-line☆122Updated last week
- A maximum-strength name parser for record linkage.☆34Updated 3 months ago
- Save an RSS or ATOM feed to a SQLite database☆47Updated 2 years ago
- Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing☆67Updated 3 weeks ago
- A Python implementation of Lunr.js 🌖☆189Updated 3 weeks ago
- Extract text from HTML☆132Updated 4 years ago
- A Python library for defining rule-based overrides on messy data☆12Updated this week
- Parse government documents into well formed JSON☆66Updated 7 months ago
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆32Updated last year
- Extract networks of entities from journalistic reporting☆47Updated last year
- Commons of stupid, simple Python micro functions. Pull requests very welcome.☆17Updated 2 years ago
- Quit Datasette if it has not received traffic for a specified time period☆16Updated 8 months ago
- Python based Wikidata framework for easy dataframe extraction☆39Updated last year
- Emojis for Python☆21Updated last year
- Datasette plugin for rendering Markdown☆25Updated last year
- Accurately find/replace/remove emojis in text strings☆159Updated 11 months ago
- Python wrapper for the MediaWiki API to access and parse data from Wikipedia☆39Updated 5 months ago
- A PDF classifier ensemble with REST API service☆23Updated 3 years ago
- Python client for Zyte API☆21Updated last month