edgi-govdata-archiving / waybackLinks
A Python API to the Internet Archive Wayback Machine
☆85Updated last week
Alternatives and similar repositories for wayback
Users that are interested in wayback are comparing it to the libraries listed below
Sorting:
- A helper library full of URL-related heuristics.☆73Updated this week
- Wayback Machine API interface & a command-line tool☆561Updated last year
- A simple Python wrapper and command-line interface for archive.org’s "Save Page Now" capturing service☆189Updated 3 weeks ago
- Parse government documents into well formed JSON☆75Updated 3 weeks ago
- Python based Wikidata framework for easy dataframe extraction☆45Updated 2 years ago
- Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.☆130Updated last month
- A maximum-strength name parser for record linkage.☆39Updated 5 months ago
- Alternative robots parser module for Python☆20Updated 2 weeks ago
- Data cleaning and validation functions for names, languages, identifiers, etc.☆52Updated last week
- A modern Python library for writing maintainable web scrapers.☆249Updated 2 months ago
- python functions for applied use of schema.org☆38Updated 4 years ago
- A set of utilities for processing MediaWiki XML dump data.☆61Updated 11 months ago
- A Python library for defining rule-based overrides on messy data☆18Updated 2 months ago
- Some tools to help analyze the twitter archive☆64Updated 8 months ago
- Python CLI tool and library for diffing CSV and JSON files☆328Updated last year
- URL normalization for Python☆99Updated 9 months ago
- Guess gender from first name in Python 2 and 3☆139Updated 8 months ago
- Support for writing WARC files with Scrapy☆24Updated 6 years ago
- Python API for PDF documents☆124Updated last year
- Datasette plugin that shows a map for any data with latitude/longitude columns☆100Updated 3 months ago
- Libzim binding for Python: read/write ZIM files in Python☆97Updated 2 months ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆55Updated this week
- Datasette plugin to create interactive dashboards☆174Updated this week
- Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.☆63Updated last week
- A database of court reporters, tests and other experiments☆122Updated this week
- API client for Aleph, supports bulk entity and document upload.☆29Updated last year
- Public API client for GETTR, a "non-bias [sic] social network," designed for data archival and analysis.☆95Updated last month
- Python wrapper for the MediaWiki API to access and parse data from Wikipedia☆43Updated last month
- A Python implementation of Lunr.js 🌖☆204Updated 11 months ago
- Dataset: BuzzFeed News “Trending” Strip, 2018–2023☆18Updated 2 years ago