webrecorder / browsertrixLinks
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
β385Updated this week
Alternatives and similar repositories for browsertrix
Users that are interested in browsertrix are comparing it to the libraries listed below
Sorting:
- Run a high-fidelity browser-based web archiving crawler in a single Docker containerβ968Updated this week
- π¨ High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.β187Updated 5 months ago
- Specifications developed and maintained by the Webrecorder community.β140Updated 3 months ago
- wabac.js - Web Archive Browsing Augmentation Clientβ122Updated last week
- Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.β411Updated last week
- Core Python Web Archiving Toolkit for replay and recording of web archivesβ1,613Updated 2 weeks ago
- Converts WARC files to static HTMLβ51Updated 4 months ago
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.β134Updated this week
- Convert Directories, Files and ZIP Files to Web Archives (WARC)β92Updated 9 months ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.β55Updated this week
- Web Archiving Integration Layer: One-Click User Instigated Preservationβ387Updated 10 months ago
- A list of things related to software, literature, and other content for π£ Mementoβ104Updated 3 weeks ago
- A Memento Aggregator CLI and Server in Goβ76Updated 11 months ago
- Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewingβ¦β109Updated 3 months ago
- Web archive index server based on RocksDBβ38Updated last week
- β56Updated last year
- Indelible linksβ494Updated 2 weeks ago
- π A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivityβ98Updated 7 years ago
- Command line tool for digging into WARC filesβ50Updated last week
- Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)β169Updated 5 months ago
- brozzler - distributed browser-based web crawlerβ783Updated last week
- Please note that the warc-indexer tool & code is now supported by NetArchiveSuite. The 'warc-indexer' directory and code that exists in tβ¦β132Updated 2 months ago
- Centralised repository for WARC usage specifications.β124Updated 3 months ago
- Experimental proxy and wrapper for safely embedding Web Archives (warc, warc.gz, wacz) into web pages.β39Updated 2 months ago
- (Experimental) High-fidelity capture of Twitter threads as sealed PDFs.β55Updated 2 years ago
- The OpenWayback Developmentβ510Updated 2 years ago
- A tool for detecting viruses and NSFW material in WARC filesβ17Updated last month
- Make a ZIM file from any Web site and surf offline!β698Updated 2 weeks ago
- A simple Python wrapper and command-line interface for archive.orgβs "Save Page Now" capturing serviceβ188Updated 2 weeks ago
- Command line tool to convert a file in the WARC format to a file in the ZIM formatβ75Updated 2 weeks ago