edgi-govdata-archiving / waybackLinks
A Python API to the Internet Archive Wayback Machine
☆75Updated 11 months ago
Alternatives and similar repositories for wayback
Users that are interested in wayback are comparing it to the libraries listed below
Sorting:
- A helper library full of URL-related heuristics.☆70Updated last month
- Alternative robots parser module for Python☆18Updated 3 weeks ago
- A maximum-strength name parser for record linkage.☆37Updated last month
- Parse government documents into well formed JSON☆70Updated 3 weeks ago
- Python based Wikidata framework for easy dataframe extraction☆45Updated last year
- A financial disclosure data extraction tool.☆16Updated last year
- API client for Aleph, supports bulk entity and document upload.☆28Updated 9 months ago
- A Python library for defining rule-based overrides on messy data☆15Updated 3 months ago
- Extract networks of entities from journalistic reporting☆48Updated last year
- Inspect a URL and estimate if it contains a news story☆39Updated 7 months ago
- Data cleaning and validation functions for names, languages, identifiers, etc.☆28Updated this week
- Next-generation Punkt sentence boundary detection with zero dependencies☆17Updated 3 months ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.☆66Updated 2 weeks ago
- Dataset: BuzzFeed News “Trending” Strip, 2018–2023☆19Updated 2 years ago
- Some tools to help analyze the twitter archive☆62Updated last month
- Now included in rigour☆151Updated 2 months ago
- Find rss, atom, xml, and rdf feeds on webpages☆30Updated 9 months ago
- Write Datasette canned queries as plain SQL files☆14Updated 3 years ago
- Utility library to turn country names into ISO two-letter codes☆70Updated last month
- Public API client for GETTR, a "non-bias [sic] social network," designed for data archival and analysis.☆92Updated 3 weeks ago
- Python wrapper for the MediaWiki API to access and parse data from Wikipedia☆41Updated 3 weeks ago
- Guess gender from first name in Python 2 and 3☆136Updated last month
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆27Updated 11 months ago
- A simple Python wrapper and command-line interface for archive.org’s "Save Page Now" capturing service☆180Updated 9 months ago
- Tag news stories based on models trained on the NYT corpus.☆42Updated 2 years ago
- Service for creating Twitter datasets for research and archiving.☆26Updated 2 years ago
- A deep learning model for extracting references from text☆29Updated last year
- Save an RSS or ATOM feed to a SQLite database☆52Updated 2 years ago
- A set of utilities for processing MediaWiki XML dump data.☆56Updated 5 months ago