edgi-govdata-archiving / wayback
A Python API to the Internet Archive Wayback Machine
☆66Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for wayback
- The little things give you away... A collection of various small helper stuff – Mirror repo only, no longer kept in sync, refer to gitea.…☆23Updated 4 years ago
- Alternative robots parser module for Python☆16Updated 2 weeks ago
- A helper library full of URL-related heuristics.☆63Updated last month
- A financial disclosure data extraction tool.☆13Updated last year
- Dataset: BuzzFeed News “Trending” Strip, 2018–2023☆19Updated last year
- A maximum-strength name parser for record linkage.☆32Updated 3 months ago
- Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.☆145Updated 9 months ago
- A framework for quick web archiving; canonical repository: https://gitea.arpa.li/JustAnotherArchivist/qwarc☆27Updated 3 years ago
- Python wrapper for the MediaWiki API to access and parse data from Wikipedia☆39Updated 4 months ago
- Commons of stupid, simple Python micro functions. Pull requests very welcome.☆17Updated 2 years ago
- Save an RSS or ATOM feed to a SQLite database☆47Updated 2 years ago
- Loadable spellfix1 extension for sqlite as python package☆25Updated 6 months ago
- A Python library for defining rule-based overrides on messy data☆12Updated 9 months ago
- Extract text from HTML☆130Updated 4 years ago
- Web scraping Page Objects core library☆95Updated 3 weeks ago
- Datasette plugin for modifying table schemas☆16Updated 2 months ago
- ☆60Updated 8 months ago
- Support for writing WARC files with Scrapy☆20Updated 4 years ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆42Updated 5 years ago
- Sort-friendly URI Reordering Transform (SURT) python module☆40Updated 3 months ago
- how hard is it to get a list of all local news sites in the United States (LOL)☆8Updated 4 years ago
- Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.☆123Updated 7 months ago
- Write Datasette canned queries as plain SQL files☆13Updated 2 years ago
- Utility library to turn country names into ISO two-letter codes☆66Updated 3 weeks ago
- Add website scraping abilities to Datasette☆61Updated last year
- America's most comprehensive dictionary of campaign finance jargon. A free resource created by and for data journalists.☆15Updated this week
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆33Updated last month
- a tool to snapshot sqlite databases you don't own☆19Updated 2 weeks ago
- Tools for running enrichments against data stored in Datasette☆19Updated 2 months ago
- Find rss, atom, xml, and rdf feeds on webpages☆30Updated last month