webrecorder / browsertrix
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
β271Updated last week
Alternatives and similar repositories for browsertrix
Users that are interested in browsertrix are comparing it to the libraries listed below
Sorting:
- Run a high-fidelity browser-based web archiving crawler in a single Docker containerβ775Updated this week
- π¨ High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.β159Updated 2 weeks ago
- wabac.js - Web Archive Browsing Augmentation Clientβ107Updated this week
- Specifications developed and maintained by the Webrecorder community.β131Updated 4 months ago
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.β113Updated 3 weeks ago
- Webrecorder Player for Desktop (OSX/Windows/Linux). (Built with Electron + Webrecorder)β446Updated 4 years ago
- WARC and ARC indexing and discovery tools.β123Updated 2 months ago
- Web Archiving Integration Layer: One-Click User Instigated Preservationβ374Updated 2 months ago
- π A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivityβ95Updated 6 years ago
- Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.β312Updated 2 weeks ago
- Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)β160Updated 4 years ago
- (Experimental) High-fidelity capture of Twitter threads as sealed PDFs.β54Updated last year
- The OpenWayback Developmentβ497Updated last year
- Experimental proxy and wrapper for safely embedding Web Archives (warc, warc.gz, wacz) into web pages.β31Updated last week
- A tool for detecting viruses and NSFW material in WARC filesβ14Updated 9 months ago
- Command line tool for digging into WARC filesβ40Updated last week
- Centralised repository for WARC usage specifications.β110Updated 5 months ago
- β43Updated last year
- Web archive index server based on RocksDBβ34Updated last week
- A list of things related to software, literature, and other content for π£ Mementoβ97Updated 11 months ago
- Convert Directories, Files and ZIP Files to Web Archives (WARC)β85Updated 3 weeks ago
- Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.β120Updated 4 months ago
- Converts WARC files to static HTMLβ44Updated 10 months ago
- Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewingβ¦β80Updated last month
- A simple Python wrapper and command-line interface for archive.orgβs "Save Page Now" capturing serviceβ177Updated 7 months ago
- An archiving tool with an IM-style interface that prioritizes privacy and accessibility, integrated with various archival services includβ¦β1,959Updated this week
- Tool and library for handling Web ARChive (WARC) files.β158Updated 7 months ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.β40Updated 2 weeks ago
- Creates a complete full text historical archive for an RSS or ATOM feed.β119Updated last week
- A Memento Aggregator CLI and Server in Goβ64Updated 2 months ago