edgi-govdata-archiving / waybackLinks
A Python API to the Internet Archive Wayback Machine
☆78Updated last week
Alternatives and similar repositories for wayback
Users that are interested in wayback are comparing it to the libraries listed below
Sorting:
- A helper library full of URL-related heuristics.☆70Updated this week
- API client for Aleph, supports bulk entity and document upload.☆28Updated 11 months ago
- A maximum-strength name parser for record linkage.☆38Updated 3 weeks ago
- A set of utilities for processing MediaWiki XML dump data.☆57Updated 7 months ago
- Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.☆130Updated last month
- Alternative robots parser module for Python☆19Updated 3 weeks ago
- Taupe takes a downloaded Twitter archive ZIP file, extracts the URLs corresponding to tweets, retweets, replies, quote tweets, and liked …☆33Updated 2 years ago
- Now included in rigour☆151Updated 2 weeks ago
- Accurately find/replace/remove emojis in text strings☆162Updated last year
- A Python library for defining rule-based overrides on messy data☆16Updated 2 weeks ago
- Python based Wikidata framework for easy dataframe extraction☆45Updated last year
- A modern Python library for writing maintainable web scrapers.☆249Updated 3 months ago
- Homoglyphs: get similar letters, convert to ASCII, detect possible languages and UTF-8 group.☆19Updated 2 weeks ago
- Python wrapper for the MediaWiki API to access and parse data from Wikipedia☆42Updated 3 weeks ago
- The country converter (coco) - a Python package for converting country names between different classification schemes.☆245Updated 2 months ago
- A webmining CLI tool & library for python.☆336Updated this week
- Parse government documents into well formed JSON☆73Updated last month
- Dataset: BuzzFeed News “Trending” Strip, 2018–2023☆19Updated 2 years ago
- Guess gender from first name in Python 2 and 3☆137Updated 4 months ago
- URLExtract is python class for collecting (extracting) URLs from given text based on locating TLD.☆268Updated last year
- Datasette plugin for uploading CSV files and converting them to database tables☆27Updated last year
- Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.☆62Updated this week
- URL normalization for Python☆98Updated 5 months ago
- A repo to collect issues with calmcode.io☆16Updated 5 years ago
- Inspect a URL and estimate if it contains a news story☆39Updated 10 months ago
- Fast and robust date extraction from web pages, with Python or on the command-line☆141Updated last month
- An automated, programming-free web scraper for interactive sites☆111Updated 2 years ago
- An open-source package for python to clean raw text data☆71Updated 2 years ago
- A Python implementation of Lunr.js 🌖☆200Updated 6 months ago
- A Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.☆118Updated last year