edgi-govdata-archiving / waybackLinks
A Python API to the Internet Archive Wayback Machine
☆73Updated 9 months ago
Alternatives and similar repositories for wayback
Users that are interested in wayback are comparing it to the libraries listed below
Sorting:
- Wayback Machine API interface & a command-line tool☆528Updated last year
- A simple Python wrapper and command-line interface for archive.org’s "Save Page Now" capturing service☆179Updated 7 months ago
- Alternative robots parser module for Python☆18Updated 2 months ago
- A helper library full of URL-related heuristics.☆69Updated 2 months ago
- A framework for quick web archiving; canonical repository: https://gitea.arpa.li/JustAnotherArchivist/qwarc☆28Updated 4 years ago
- Dataset: BuzzFeed News “Trending” Strip, 2018–2023☆19Updated 2 years ago
- Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.☆129Updated last year
- Guess gender from first name in Python 2 and 3☆134Updated last week
- The little things give you away... A collection of various small helper stuff – Mirror repo only, no longer kept in sync, refer to gitea.…☆24Updated 4 years ago
- ☆62Updated 4 months ago
- Fast and robust date extraction from web pages, with Python or on the command-line☆127Updated 5 months ago
- Now included in rigour☆151Updated 3 weeks ago
- A Python implementation of Lunr.js 🌖☆195Updated 2 months ago
- Extract text from HTML☆135Updated 4 years ago
- Python wrapper for the MediaWiki API to access and parse data from Wikipedia☆40Updated 2 months ago
- Newsfeed based on GDELT Project☆26Updated last year
- Python Unicode Block Utilities☆24Updated 4 years ago
- 🐾 PdpCLI is a pandas DataFrame processing CLI tool which enables you to build a pandas pipeline from a configuration file.☆15Updated last year
- Data cleaning and validation functions for names, languages, identifiers, etc.☆21Updated this week
- Web scraping Page Objects core library☆101Updated this week
- Save an RSS or ATOM feed to a SQLite database☆52Updated 2 years ago
- URL utilities for markdown-it (a Python port)☆13Updated 2 months ago
- etl pipeline, graphical explorer and general toolbox for investigations with follow the money data☆23Updated last year
- Atom, RSS and JSON feed parser for Python 3☆117Updated 2 years ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆40Updated 3 weeks ago
- A set of utilities for processing MediaWiki XML dump data.☆53Updated 3 months ago
- Homoglyphs: get similar letters, convert to ASCII, detect possible languages and UTF-8 group.☆19Updated 3 months ago
- Pythonic wrapper for the Google Sheets API☆123Updated 7 months ago
- Commons of stupid, simple Python micro functions. Pull requests very welcome.☆19Updated last month
- A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine☆173Updated 4 months ago