webrecorder / browsertrix
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
β256Updated this week
Alternatives and similar repositories for browsertrix:
Users that are interested in browsertrix are comparing it to the libraries listed below
- Run a high-fidelity browser-based web archiving crawler in a single Docker containerβ736Updated this week
- π¨ High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.β156Updated last week
- wabac.js - Web Archive Browsing Augmentation Clientβ107Updated this week
- Specifications developed and maintained by the Webrecorder community.β129Updated 2 months ago
- Serverless replay of web archives directly in the browserβ777Updated 3 weeks ago
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.β110Updated 2 months ago
- Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)β158Updated 4 years ago
- Web Archiving Integration Layer: One-Click User Instigated Preservationβ370Updated 3 weeks ago
- WARC and ARC indexing and discovery tools.β122Updated 3 weeks ago
- Convert Directories, Files and ZIP Files to Web Archives (WARC)β85Updated 2 weeks ago
- π A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivityβ95Updated 6 years ago
- The OpenWayback Developmentβ497Updated last year
- Experimental proxy and wrapper for safely embedding Web Archives (warc, warc.gz, wacz) into web pages.β29Updated 2 weeks ago
- Centralised repository for WARC usage specifications.β109Updated 4 months ago
- Converts WARC files to static HTMLβ44Updated 9 months ago
- brozzler - distributed browser-based web crawlerβ698Updated this week
- β42Updated 11 months ago
- A Tool To Push Web Resources Into Web Archivesβ419Updated last year
- Tool and library for handling Web ARChive (WARC) files.β156Updated 5 months ago
- Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.β293Updated 2 weeks ago
- A simple Python wrapper and command-line interface for archive.orgβs "Save Page Now" capturing serviceβ177Updated 5 months ago
- (Experimental) High-fidelity capture of Twitter threads as sealed PDFs.β54Updated last year
- Web archive index server based on RocksDBβ34Updated 4 months ago
- A tool for detecting viruses and NSFW material in WARC filesβ11Updated 7 months ago
- Command line tool for digging into WARC filesβ39Updated this week
- Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a headβ170Updated 4 years ago
- Command line tool to convert a file in the WARC format to a file in the ZIM formatβ55Updated 2 weeks ago
- The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.β143Updated last year
- A list of things related to software, literature, and other content for π£ Mementoβ96Updated 10 months ago
- A Memento Aggregator CLI and Server in Goβ62Updated last month