harvard-lil / scoopLinks
π¨ High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.
β166Updated last month
Alternatives and similar repositories for scoop
Users that are interested in scoop are comparing it to the libraries listed below
Sorting:
- Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more β¦β306Updated this week
- wabac.js - Web Archive Browsing Augmentation Clientβ111Updated last week
- (Experimental) High-fidelity capture of Twitter threads as sealed PDFs.β54Updated last year
- Specifications developed and maintained by the Webrecorder community.β132Updated 6 months ago
- Experimental proxy and wrapper for safely embedding Web Archives (warc, warc.gz, wacz) into web pages.β35Updated 2 months ago
- A list of things related to software, literature, and other content for π£ Mementoβ99Updated last year
- Static Site Generator for Viewing Web Archives (in WACZ) formatβ27Updated 2 years ago
- Converts WARC files to static HTMLβ46Updated last year
- π§© Proposal to allow user scripts like "expand comments", "hide popups", "fill out this form", etc. to be reusable across pure browser enβ¦β19Updated 3 weeks ago
- A Memento Aggregator CLI and Server in Goβ67Updated 4 months ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.β44Updated this week
- π A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivityβ96Updated 6 years ago
- A simple Python wrapper and command-line interface for archive.orgβs "Save Page Now" capturing serviceβ180Updated 9 months ago
- Web archive index server based on RocksDBβ34Updated 2 weeks ago
- Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewingβ¦β83Updated last week
- A tool for detecting viruses and NSFW material in WARC filesβ15Updated 11 months ago
- Chrome extension that uses Memento to indicate that a page a user is viewing on the live web has an archived copy and to give the user acβ¦β54Updated last month
- Convert Directories, Files and ZIP Files to Web Archives (WARC)β86Updated 3 months ago
- A tool to detect whether a PDF has a bad redactionβ145Updated 3 weeks ago
- Chrome extension to "Create WARC files from any webpage"β222Updated last year
- A command line utility for listing and searching snapshots in web archivesβ16Updated last year
- Please note that the warc-indexer tool & code is now supported by NetArchiveSuite. The 'warc-indexer' directory and code that exists in tβ¦β127Updated last week
- Download and attach provenance to public datasetsβ33Updated 4 months ago
- searchmysite.net is an open source search engine and search as a serviceβ131Updated 3 weeks ago
- Comparing warc filesβ17Updated 6 years ago
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.β118Updated this week
- DocumentCloud's back end source code - Please report bugs, issues and feature requests to info@documentcloud.orgβ40Updated 2 weeks ago
- Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.β125Updated 7 months ago
- Indelible linksβ474Updated last month
- Command line tool for digging into WARC filesβ44Updated last week