harvard-lil / scoopLinks
π¨ High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.
β160Updated last month
Alternatives and similar repositories for scoop
Users that are interested in scoop are comparing it to the libraries listed below
Sorting:
- Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more β¦β277Updated this week
- wabac.js - Web Archive Browsing Augmentation Clientβ108Updated this week
- Experimental proxy and wrapper for safely embedding Web Archives (warc, warc.gz, wacz) into web pages.β32Updated 3 weeks ago
- A tool for detecting viruses and NSFW material in WARC filesβ15Updated 9 months ago
- (Experimental) High-fidelity capture of Twitter threads as sealed PDFs.β54Updated last year
- Specifications developed and maintained by the Webrecorder community.β131Updated 4 months ago
- A list of things related to software, literature, and other content for π£ Mementoβ98Updated last year
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.β40Updated 3 weeks ago
- Converts WARC files to static HTMLβ44Updated 11 months ago
- Chrome extension to "Create WARC files from any webpage"β220Updated last year
- Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewingβ¦β79Updated last month
- A simple Python wrapper and command-line interface for archive.orgβs "Save Page Now" capturing serviceβ179Updated 7 months ago
- π A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivityβ95Updated 6 years ago
- Convert Directories, Files and ZIP Files to Web Archives (WARC)β85Updated last month
- Static Site Generator for Viewing Web Archives (in WACZ) formatβ27Updated last year
- A Memento Aggregator CLI and Server in Goβ64Updated 2 months ago
- Creates a complete full text historical archive for an RSS or ATOM feed.β119Updated this week
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.β113Updated 2 weeks ago
- π§© Proposal to allow user scripts like "expand comments", "hide popups", "fill out this form", etc. to be reusable across pure browser enβ¦β18Updated 2 months ago
- WARC and ARC indexing and discovery tools.β124Updated 2 months ago
- Web archive index server based on RocksDBβ34Updated 3 weeks ago
- Download and attach provenance to public datasetsβ32Updated 2 months ago
- Chrome extension that uses Memento to indicate that a page a user is viewing on the live web has an archived copy and to give the user acβ¦β54Updated 3 months ago
- Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.β121Updated 5 months ago
- Command line tool for digging into WARC filesβ40Updated 2 weeks ago
- β44Updated last year
- Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)β160Updated 4 years ago
- A command line utility for listing and searching snapshots in web archivesβ16Updated last year
- β11Updated 11 months ago
- Command line tool to convert a file in the WARC format to a file in the ZIM formatβ58Updated 2 months ago