edgi-govdata-archiving / waybackLinks
A Python API to the Internet Archive Wayback Machine
☆81Updated 2 weeks ago
Alternatives and similar repositories for wayback
Users that are interested in wayback are comparing it to the libraries listed below
Sorting:
- Wayback Machine API interface & a command-line tool☆555Updated last year
- A simple Python wrapper and command-line interface for archive.org’s "Save Page Now" capturing service☆188Updated last year
- A helper library full of URL-related heuristics.☆73Updated 3 months ago
- Alternative robots parser module for Python☆20Updated 2 weeks ago
- Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.☆130Updated 2 months ago
- Now included in rigour☆152Updated last month
- Python based Wikidata framework for easy dataframe extraction☆45Updated 2 years ago
- API client for Aleph, supports bulk entity and document upload.☆28Updated last year
- Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.☆62Updated last week
- A webmining CLI tool & library for python.☆344Updated last week
- A maximum-strength name parser for record linkage.☆39Updated 3 months ago
- The little things give you away... A collection of various small helper stuff – Mirror repo only, no longer kept in sync, refer to gitea.…☆24Updated 5 years ago
- Guess gender from first name in Python 2 and 3☆138Updated 7 months ago
- Dataset: BuzzFeed News “Trending” Strip, 2018–2023☆19Updated 2 years ago
- Effortless conversion between data formats like JSON, XML and CSV☆119Updated 3 years ago
- Fast and robust date extraction from web pages, with Python or on the command-line☆142Updated last month
- Python API for PDF documents☆124Updated last year
- A Python library for defining rule-based overrides on messy data☆16Updated last month
- ☆63Updated 11 months ago
- Libzim binding for Python: read/write ZIM files in Python☆97Updated 2 weeks ago
- Data cleaning and validation functions for names, languages, identifiers, etc.☆50Updated last week
- Accurately find/replace/remove emojis in text strings☆163Updated 2 years ago
- A PDF classifier ensemble with REST API service☆23Updated 4 years ago
- A financial disclosure data extraction tool.☆18Updated 2 years ago
- Public API client for GETTR, a "non-bias [sic] social network," designed for data archival and analysis.☆95Updated 3 weeks ago
- Extract networks of entities from journalistic reporting☆49Updated 2 years ago
- A modern Python library for writing maintainable web scrapers.☆248Updated last month
- Integer to Roman numerals converter☆50Updated last month
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 6 years ago
- 🌬️urlExpander is a Python package for expanding shortened links (urls).☆76Updated 3 years ago