webrecorder / browsertrixLinks
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
☆344Updated this week
Alternatives and similar repositories for browsertrix
Users that are interested in browsertrix are comparing it to the libraries listed below
Sorting:
- Run a high-fidelity browser-based web archiving crawler in a single Docker container☆895Updated this week
- 🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.☆177Updated last month
- wabac.js - Web Archive Browsing Augmentation Client☆114Updated 2 weeks ago
- Specifications developed and maintained by the Webrecorder community.☆136Updated last week
- Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.☆369Updated 5 months ago
- Serverless replay of web archives directly in the browser☆852Updated last week
- Core Python Web Archiving Toolkit for replay and recording of web archives☆1,570Updated this week
- Converts WARC files to static HTML☆49Updated last month
- A list of things related to software, literature, and other content for 🕣 Memento☆100Updated last year
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.☆131Updated this week
- Experimental proxy and wrapper for safely embedding Web Archives (warc, warc.gz, wacz) into web pages.☆38Updated 5 months ago
- A Tool To Push Web Resources Into Web Archives☆423Updated last year
- MediaWiki scraper: all your wiki articles in one highly compressed ZIM file☆400Updated last week
- Indelible links☆484Updated last month
- Web Archiving Integration Layer: One-Click User Instigated Preservation☆381Updated 7 months ago
- A Memento Aggregator CLI and Server in Go☆69Updated 7 months ago
- Chrome extension to "Create WARC files from any webpage"☆224Updated last year
- Web archive index server based on RocksDB☆36Updated last month
- Centralised repository for WARC usage specifications.☆117Updated 2 weeks ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆50Updated 2 weeks ago
- Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)☆165Updated 2 months ago
- A tool for detecting viruses and NSFW material in WARC files☆17Updated last year
- brozzler - distributed browser-based web crawler☆756Updated last week
- ☆53Updated last year
- Command line tool for digging into WARC files☆46Updated 3 weeks ago
- 📚 A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivity☆97Updated 7 years ago
- (Experimental) High-fidelity capture of Twitter threads as sealed PDFs.☆55Updated last year
- Convert Directories, Files and ZIP Files to Web Archives (WARC)☆89Updated 6 months ago
- A tool for collection archival slivers of the web and web archives☆16Updated 8 months ago
- Creates a complete full text historical archive for an RSS or ATOM feed.☆125Updated last week