edgi-govdata-archiving / wayback
A Python API to the Internet Archive Wayback Machine
☆69Updated 5 months ago
Alternatives and similar repositories for wayback:
Users that are interested in wayback are comparing it to the libraries listed below
- Alternative robots parser module for Python☆17Updated last month
- Wayback Machine API interface & a command-line tool☆494Updated 10 months ago
- The little things give you away... A collection of various small helper stuff – Mirror repo only, no longer kept in sync, refer to gitea.…☆23Updated 4 years ago
- A simple Python wrapper and command-line interface for archive.org’s "Save Page Now" capturing service☆170Updated 3 months ago
- Python wrapper for the MediaWiki API to access and parse data from Wikipedia☆39Updated 7 months ago
- Template repository for Python projects☆33Updated 3 weeks ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆36Updated this week
- Write Datasette canned queries as plain SQL files☆13Updated 2 years ago
- A helper library full of URL-related heuristics.☆64Updated 3 months ago
- Python based Wikidata framework for easy dataframe extraction☆41Updated last year
- A set of utilities for processing MediaWiki XML dump data.☆48Updated 5 months ago
- Support for writing WARC files with Scrapy☆21Updated 5 years ago
- A framework for quick web archiving; canonical repository: https://gitea.arpa.li/JustAnotherArchivist/qwarc☆27Updated 3 years ago
- Extract networks of entities from journalistic reporting☆47Updated last year
- Dataset: BuzzFeed News “Trending” Strip, 2018–2023☆19Updated last year
- Taupe takes a downloaded Twitter archive ZIP file, extracts the URLs corresponding to tweets, retweets, replies, quote tweets, and liked …☆32Updated last year
- A pure Python Levenshtein implementation that's not freaking GPL'd.☆96Updated last year
- A maximum-strength name parser for record linkage.☆36Updated 5 months ago
- An extremely configurable markdown reverser for Python3.☆15Updated 11 months ago
- Datasette plugin for modifying table schemas☆17Updated 4 months ago
- 🌬️urlExpander is a Python package for expanding shortened links (urls).☆73Updated 2 years ago
- Add website scraping abilities to Datasette☆62Updated last year
- Sort-friendly URI Reordering Transform (SURT) python module☆41Updated 5 months ago
- A community provider for the python faker library to fake airline data for testing purposes.☆16Updated last year
- Loadable spellfix1 extension for sqlite as python package☆25Updated 8 months ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆42Updated 5 years ago
- Datasette plugin for searching all searchable tables at once☆21Updated 4 months ago
- Datasette plugin for uploading CSV files and converting them to database tables☆25Updated 9 months ago
- A Datasette plugin providing an MLOps platform to train, eval and predict machine learning models☆15Updated last week
- Save an RSS or ATOM feed to a SQLite database☆47Updated 2 years ago