webrecorder / browsertrixLinks
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
β309Updated this week
Alternatives and similar repositories for browsertrix
Users that are interested in browsertrix are comparing it to the libraries listed below
Sorting:
- π¨ High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.β166Updated last month
- wabac.js - Web Archive Browsing Augmentation Clientβ111Updated last week
- Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.β334Updated 3 months ago
- Specifications developed and maintained by the Webrecorder community.β132Updated 6 months ago
- Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewingβ¦β83Updated last week
- π A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivityβ96Updated 6 years ago
- A list of things related to software, literature, and other content for π£ Mementoβ99Updated last year
- Converts WARC files to static HTMLβ46Updated last year
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.β118Updated last week
- Chrome extension to "Create WARC files from any webpage"β222Updated last year
- Indelible linksβ475Updated last month
- brozzler - distributed browser-based web crawlerβ729Updated this week
- Web Archiving Integration Layer: One-Click User Instigated Preservationβ377Updated 4 months ago
- π§© Proposal to allow user scripts like "expand comments", "hide popups", "fill out this form", etc. to be reusable across pure browser enβ¦β19Updated 3 weeks ago
- WARC writing MITM HTTP/S proxyβ417Updated 2 weeks ago
- A Tool To Push Web Resources Into Web Archivesβ420Updated last year
- π A Docker Compose bundle to run on servers with spare CPU, RAM, disk, and bandwidth to help the world. Includes Tor, ArchiveWarrior, Bβ¦β370Updated 2 months ago
- Web archive index server based on RocksDBβ34Updated 3 weeks ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.β44Updated last week
- Creates a complete full text historical archive for an RSS or ATOM feed.β123Updated last week
- Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)β163Updated 3 weeks ago
- Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.β125Updated 7 months ago
- A simple Python wrapper and command-line interface for archive.orgβs "Save Page Now" capturing serviceβ180Updated 9 months ago
- A Memento Aggregator CLI and Server in Goβ67Updated 5 months ago
- Command line tool to convert a file in the WARC format to a file in the ZIM formatβ64Updated 4 months ago
- Experimental proxy and wrapper for safely embedding Web Archives (warc, warc.gz, wacz) into web pages.β35Updated 3 months ago
- The OpenWayback Developmentβ500Updated last year
- Command line tool for digging into WARC filesβ44Updated 2 weeks ago
- Centralised repository for WARC usage specifications.β115Updated 8 months ago
- (Experimental) High-fidelity capture of Twitter threads as sealed PDFs.β54Updated last year