harvard-lil/scoop

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/harvard-lil/scoop)

harvard-lil / scoop

🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.

☆207

Alternatives and similar repositories for scoop

Users that are interested in scoop are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

harvard-lil / wacz-exhibitor
View on GitHub
Experimental proxy and wrapper for safely embedding Web Archives (warc, warc.gz, wacz) into web pages.
☆44Nov 24, 2025Updated 8 months ago
harvard-lil / js-wacz
View on GitHub
JavaScript module and CLI tool for working with web archive data using the WACZ format specification.
☆17Mar 11, 2025Updated last year
webrecorder / browsertrix-behaviors
View on GitHub
Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.
☆58Jul 23, 2026Updated last week
harvard-lil / thread-keeper
View on GitHub
(Experimental) High-fidelity capture of Twitter threads as sealed PDFs.
☆55Dec 4, 2023Updated 2 years ago
harvard-lil / waczerciser
View on GitHub
Create and edit WARC and WACZ files
☆29Dec 6, 2024Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
ukwa / ukwa-pywb
View on GitHub
☆11Nov 21, 2025Updated 8 months ago
anjackson / sliver
View on GitHub
A tool for collection archival slivers of the web and web archives
☆19Jun 1, 2026Updated last month
webrecorder / web-replay-gen
View on GitHub
Static Site Generator for Viewing Web Archives (in WACZ) format
☆29Jun 30, 2023Updated 3 years ago
webrecorder / py-wacz
View on GitHub
☆61Apr 11, 2024Updated 2 years ago
webrecorder / pywb-remote-browsers
View on GitHub
Docker Compose based system for running remote browsers (including Flash and Java support) connected to web archives
☆16Jun 10, 2021Updated 5 years ago
webrecorder / browsertrix
View on GitHub
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more …
☆446Updated this week
webrecorder / web-archive-site-mirror
View on GitHub
☆18Apr 16, 2026Updated 3 months ago
webrecorder / browsertrix-crawler
View on GitHub
Run a high-fidelity browser-based web archiving crawler in a single Docker container
☆1,093Updated this week
NationalLibraryOfNorway / warchaeology
View on GitHub
Command line tool for digging into WARC files
☆50Updated this week
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
webrecorder / replayweb.page
View on GitHub
Serverless replay of web archives directly in the browser
☆965Jul 13, 2026Updated 2 weeks ago
harvard-lil / perma
View on GitHub
Websites change. Perma Links don't.
☆524Updated this week
IIIF-Commons / iiif-builder
View on GitHub
☆17Feb 12, 2024Updated 2 years ago
vphill / web-archiving-course
View on GitHub
Web Archiving Course
☆23Mar 4, 2024Updated 2 years ago
Systemik-Solutions / glycerine-viewer
View on GitHub
A VUE IIIF viewer
☆15Jun 5, 2026Updated last month
WebMemex / freeze-dry
View on GitHub
Snapshots a web page to get it as a static, self-contained HTML document.
☆303Sep 18, 2022Updated 3 years ago
ArchiveBox / abx-spec-behaviors
View on GitHub
🧩 Proposal to allow user scripts like "expand comments", "hide popups", "fill out this form", etc. to be reusable across pure browser en…
☆20Jul 11, 2025Updated last year
hadro / directory-pipeline
View on GitHub
A pipeline for turning digital collections into structured data -- an LLM assisted, IIIF-native tool to jump into working with sources li…
☆16Updated this week
caltechlibrary / caltechdata_api
View on GitHub
Python library for using the CaltechDATA API
☆12Jul 22, 2026Updated last week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
marriott-library / MaRMAT-Beta
View on GitHub
This is a metadata assessment tool to query spreadsheet-based digital collection metadata against lexicons of offensive and outdated term…
☆18Jun 18, 2025Updated last year
ArchiveBox / DigestBox
View on GitHub
DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Arch…
☆22Feb 2, 2024Updated 2 years ago
harvard-lil / warc-gpt
View on GitHub
WARC + AI - Experimental Retrieval Augmented Generation Pipeline for Web Archive Collections.
☆274Feb 11, 2025Updated last year
TaylorJadin / site-archiving-toolkit
View on GitHub
☆10Dec 3, 2025Updated 7 months ago
recogito / recogito-studio
View on GitHub
Self hosting code for Recogito-Studio
☆23Jul 6, 2026Updated 3 weeks ago
jptmoore / maniiifest
View on GitHub
Typesafe IIIF presentation v3 parsing without external dependencies
☆12Jun 29, 2026Updated last month
palcilibraries / CC-Plus-Legacy-Deprecated
View on GitHub
Consortia Collaborating on a Platform for Usage Statistics
☆11Aug 7, 2025Updated 11 months ago
dpla / dpla-frontend
View on GitHub
React application for the Digital Public Library of America website
☆32Jul 14, 2026Updated 2 weeks ago
gonejack / webarchive-to-singlefile
View on GitHub
This command line converts .webarchive file to resources embed .html file
☆23Mar 3, 2026Updated 4 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
N0taN3rd / simplechrome
View on GitHub
Webrecorders DevTools Protocol Automation Library
☆18Oct 18, 2022Updated 3 years ago
bodleian / iiif-static-choices
View on GitHub
A IIIF static tile and manifest generator built using Python to generate IIIF tiled images and manifests. This application was put toget…
☆11Jul 9, 2026Updated 2 weeks ago
internetarchive / iiif
View on GitHub
The official Internet Archive IIIF service
☆27Jul 17, 2026Updated last week
oldweb-today / remote-desktop-server
View on GitHub
A set of Docker images for streaming a remote desktop video and audio
☆27May 15, 2023Updated 3 years ago
reprozip-news-apps / reprozip-web
View on GitHub
ReproZip for the Preservation of Web Applications
☆17May 6, 2024Updated 2 years ago
iipc / awesome-web-archiving
View on GitHub
An Awesome List for getting started with web archiving
☆2,607Apr 27, 2026Updated 3 months ago
ArchiveTeam / ludios_wpull
View on GitHub
wpull fork with fixes and faster parsing using html5-parser; used by grab-site; should go away when wpull is similarly improved
☆31Sep 20, 2025Updated 10 months ago