webrecorder / browsertrixLinks
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
β289Updated this week
Alternatives and similar repositories for browsertrix
Users that are interested in browsertrix are comparing it to the libraries listed below
Sorting:
- Run a high-fidelity browser-based web archiving crawler in a single Docker containerβ812Updated last week
- π¨ High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.β162Updated last week
- Specifications developed and maintained by the Webrecorder community.β131Updated 5 months ago
- wabac.js - Web Archive Browsing Augmentation Clientβ108Updated last week
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.β114Updated 3 weeks ago
- Core Python Web Archiving Toolkit for replay and recording of web archivesβ1,517Updated last month
- π A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivityβ95Updated 6 years ago
- Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)β161Updated 3 weeks ago
- The OpenWayback Developmentβ499Updated last year
- Convert Directories, Files and ZIP Files to Web Archives (WARC)β85Updated 2 months ago
- A list of things related to software, literature, and other content for π£ Mementoβ99Updated last year
- Centralised repository for WARC usage specifications.β113Updated 7 months ago
- β46Updated last year
- Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewingβ¦β80Updated 2 months ago
- Tool and library for handling Web ARChive (WARC) files.β159Updated 8 months ago
- Converts WARC files to static HTMLβ44Updated 11 months ago
- Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.β318Updated last month
- brozzler - distributed browser-based web crawlerβ720Updated 2 weeks ago
- WARC and ARC indexing and discovery tools.β124Updated 3 months ago
- A simple Python wrapper and command-line interface for archive.orgβs "Save Page Now" capturing serviceβ180Updated 8 months ago
- Experimental proxy and wrapper for safely embedding Web Archives (warc, warc.gz, wacz) into web pages.β32Updated last month
- Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a headβ170Updated 5 years ago
- Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.β124Updated 5 months ago
- Command line tool for digging into WARC filesβ40Updated 2 weeks ago
- Web archive index server based on RocksDBβ34Updated last month
- CDXJ Indexing of WARC/ARCsβ26Updated 6 months ago
- A tool for detecting viruses and NSFW material in WARC filesβ15Updated 10 months ago
- JS Streaming WARC IO optimized for Browser and Nodeβ44Updated 2 months ago
- NOTE: This project is no longer being actively developed.. Check out https://replayweb.page / https://github.com/webrecorder/replayweb.paβ¦β201Updated 5 months ago
- (Experimental) High-fidelity capture of Twitter threads as sealed PDFs.β54Updated last year