edgi-govdata-archiving / waybackLinks
A Python API to the Internet Archive Wayback Machine
☆84Updated 2 weeks ago
Alternatives and similar repositories for wayback
Users that are interested in wayback are comparing it to the libraries listed below
Sorting:
- A helper library full of URL-related heuristics.☆73Updated 3 months ago
- Wayback Machine API interface & a command-line tool☆556Updated last year
- A simple Python wrapper and command-line interface for archive.org’s "Save Page Now" capturing service☆189Updated last year
- Alternative robots parser module for Python☆20Updated last month
- Parse government documents into well formed JSON☆75Updated this week
- A maximum-strength name parser for record linkage.☆39Updated 4 months ago
- Now included in rigour☆152Updated last month
- Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.☆130Updated 3 weeks ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆55Updated last month
- Public API client for GETTR, a "non-bias [sic] social network," designed for data archival and analysis.☆96Updated last week
- Dataset: BuzzFeed News “Trending” Strip, 2018–2023☆18Updated 2 years ago
- Python wrapper for the MediaWiki API to access and parse data from Wikipedia☆42Updated last week
- Fast and robust date extraction from web pages, with Python or on the command-line☆143Updated 2 months ago
- Python library and command line tool for collecting JSON data from Gab.ai. Scrape posts, users and comments from "free-speech" social med…☆38Updated 3 years ago
- Python based Wikidata framework for easy dataframe extraction☆45Updated 2 years ago
- A polite and user-friendly downloader for Common Crawl data☆65Updated 5 months ago
- A set of utilities for processing MediaWiki XML dump data.☆61Updated 11 months ago
- A framework for quick web archiving; canonical repository: https://gitea.arpa.li/JustAnotherArchivist/qwarc☆30Updated 4 years ago
- A webmining CLI tool & library for python.☆346Updated last week
- URL normalization for Python☆99Updated 8 months ago
- A database of court reporters, tests and other experiments☆119Updated last week
- UNOFFICIAL Python API to interface with Parler.com☆53Updated last year
- Extract networks of entities from journalistic reporting☆49Updated 2 years ago
- API client for Aleph, supports bulk entity and document upload.☆29Updated last year
- Guess gender from first name in Python 2 and 3☆138Updated 7 months ago
- Homoglyphs: get similar letters, convert to ASCII, detect possible languages and UTF-8 group.☆19Updated 2 weeks ago
- A multithread Pushshift.io API Wrapper for reddit.com comment and submission searches.☆220Updated 2 years ago
- A financial disclosure data extraction tool.☆19Updated 2 years ago
- Some tools to help analyze the twitter archive☆64Updated 7 months ago
- A Python library for defining rule-based overrides on messy data☆17Updated last month