webrecorder / browsertrixLinks
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
β328Updated last week
Alternatives and similar repositories for browsertrix
Users that are interested in browsertrix are comparing it to the libraries listed below
Sorting:
- Run a high-fidelity browser-based web archiving crawler in a single Docker containerβ873Updated this week
- π¨ High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.β170Updated last week
- Specifications developed and maintained by the Webrecorder community.β136Updated 8 months ago
- wabac.js - Web Archive Browsing Augmentation Clientβ113Updated this week
- Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.β355Updated 4 months ago
- Core Python Web Archiving Toolkit for replay and recording of web archivesβ1,555Updated 3 weeks ago
- Converts WARC files to static HTMLβ48Updated last year
- Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewingβ¦β90Updated last month
- Experimental proxy and wrapper for safely embedding Web Archives (warc, warc.gz, wacz) into web pages.β36Updated 4 months ago
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.β131Updated last month
- A list of things related to software, literature, and other content for π£ Mementoβ99Updated last year
- Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.β129Updated 3 weeks ago
- Centralised repository for WARC usage specifications.β116Updated 9 months ago
- Web archive index server based on RocksDBβ35Updated 2 weeks ago
- Indelible linksβ479Updated last week
- (Experimental) High-fidelity capture of Twitter threads as sealed PDFs.β54Updated last year
- β52Updated last year
- Web Archiving Integration Layer: One-Click User Instigated Preservationβ377Updated 6 months ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.β50Updated last week
- Chrome extension to "Create WARC files from any webpage"β223Updated last year
- Command line tool for digging into WARC filesβ46Updated last week
- π A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivityβ97Updated 6 years ago
- brozzler - distributed browser-based web crawlerβ739Updated last week
- A Tool To Push Web Resources Into Web Archivesβ422Updated last year
- A Memento Aggregator CLI and Server in Goβ69Updated 6 months ago
- A tool for detecting viruses and NSFW material in WARC filesβ16Updated last year
- The OpenWayback Developmentβ504Updated last year
- A simple Python wrapper and command-line interface for archive.orgβs "Save Page Now" capturing serviceβ182Updated 11 months ago
- Command line tool to convert a file in the WARC format to a file in the ZIM formatβ70Updated 5 months ago
- Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)β164Updated 3 weeks ago