harvard-lil / scoopLinks
π¨ High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.
β169Updated last week
Alternatives and similar repositories for scoop
Users that are interested in scoop are comparing it to the libraries listed below
Sorting:
- Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more β¦β313Updated last week
- Experimental proxy and wrapper for safely embedding Web Archives (warc, warc.gz, wacz) into web pages.β36Updated 3 months ago
- (Experimental) High-fidelity capture of Twitter threads as sealed PDFs.β54Updated last year
- wabac.js - Web Archive Browsing Augmentation Clientβ113Updated last week
- A list of things related to software, literature, and other content for π£ Mementoβ99Updated last year
- Specifications developed and maintained by the Webrecorder community.β135Updated 7 months ago
- Static Site Generator for Viewing Web Archives (in WACZ) formatβ27Updated 2 years ago
- Converts WARC files to static HTMLβ47Updated last year
- π§© Proposal to allow user scripts like "expand comments", "hide popups", "fill out this form", etc. to be reusable across pure browser enβ¦β19Updated last month
- A simple Python wrapper and command-line interface for archive.orgβs "Save Page Now" capturing serviceβ181Updated 10 months ago
- searchmysite.net is an open source search engine and search as a serviceβ132Updated last month
- Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewingβ¦β88Updated 3 weeks ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.β47Updated 3 weeks ago
- π A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivityβ96Updated 6 years ago
- Creates a complete full text historical archive for an RSS or ATOM feed.β123Updated 3 weeks ago
- Chrome extension to "Create WARC files from any webpage"β222Updated last year
- Chrome extension that uses Memento to indicate that a page a user is viewing on the live web has an archived copy and to give the user acβ¦β54Updated 2 weeks ago
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.β131Updated 3 weeks ago
- Download and attach provenance to public datasetsβ33Updated 4 months ago
- Convert Directories, Files and ZIP Files to Web Archives (WARC)β86Updated 4 months ago
- A command line utility for listing and searching snapshots in web archivesβ16Updated last year
- Tool to index and serve HTML files. Powered by Datasette.β104Updated 3 years ago
- Please note that the warc-indexer tool & code is now supported by NetArchiveSuite. The 'warc-indexer' directory and code that exists in tβ¦β128Updated 3 weeks ago
- Snapshots a web page to get it as a static, self-contained HTML document.β294Updated 2 years ago
- Create and edit WARC and WACZ filesβ14Updated 8 months ago
- JavaScript module and CLI tool for working with web archive data using the WACZ format specification.β16Updated 5 months ago
- Command line tool to convert a file in the WARC format to a file in the ZIM formatβ65Updated 5 months ago
- A tool for detecting viruses and NSFW material in WARC filesβ15Updated last year
- Full text search all your browsing history using Postgres + WASMβ132Updated 3 months ago
- A dockerized, queued high fidelity web archiver based on Squidwarcβ61Updated last year