webrecorder / browsertrixLinks
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
β360Updated this week
Alternatives and similar repositories for browsertrix
Users that are interested in browsertrix are comparing it to the libraries listed below
Sorting:
- Run a high-fidelity browser-based web archiving crawler in a single Docker containerβ928Updated this week
- π¨ High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.β180Updated 3 months ago
- Specifications developed and maintained by the Webrecorder community.β137Updated last month
- wabac.js - Web Archive Browsing Augmentation Clientβ116Updated this week
- Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.β384Updated 7 months ago
- Serverless replay of web archives directly in the browserβ861Updated last week
- Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewingβ¦β105Updated last month
- Core Python Web Archiving Toolkit for replay and recording of web archivesβ1,587Updated last week
- π A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivityβ98Updated 7 years ago
- A list of things related to software, literature, and other content for π£ Mementoβ102Updated last year
- MediaWiki scraper: all your wiki articles in one highly compressed ZIM fileβ411Updated this week
- Converts WARC files to static HTMLβ49Updated 2 months ago
- Web Archiving Integration Layer: One-Click User Instigated Preservationβ384Updated 8 months ago
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.β132Updated last week
- (Experimental) High-fidelity capture of Twitter threads as sealed PDFs.β55Updated 2 years ago
- A Tool To Push Web Resources Into Web Archivesβ424Updated last year
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.β52Updated last week
- β55Updated last year
- Command line tool to convert a file in the WARC format to a file in the ZIM formatβ75Updated 8 months ago
- A Memento Aggregator CLI and Server in Goβ72Updated 9 months ago
- Indelible linksβ489Updated this week
- Make a ZIM file from any Web site and surf offline!β650Updated last month
- Experimental proxy and wrapper for safely embedding Web Archives (warc, warc.gz, wacz) into web pages.β38Updated last week
- Web archive index server based on RocksDBβ36Updated last month
- brozzler - distributed browser-based web crawlerβ760Updated this week
- InterPlanetary Wayback: A distributed and persistent archive replay system using IPFSβ645Updated last month
- Command line tool for digging into WARC filesβ47Updated last week
- Offline Internet Archive projectβ305Updated last year
- Please note that the warc-indexer tool & code is now supported by NetArchiveSuite. The 'warc-indexer' directory and code that exists in tβ¦β131Updated 2 weeks ago
- A tool for detecting viruses and NSFW material in WARC filesβ17Updated last year