zytedata / web-snapLinks
Create "perfect" snapshots of web pages
β33Updated 3 months ago
Alternatives and similar repositories for web-snap
Users that are interested in web-snap are comparing it to the libraries listed below
Sorting:
- π§© Proposal to allow user scripts like "expand comments", "hide popups", "fill out this form", etc. to be reusable across pure browser enβ¦β19Updated 4 months ago
- Tool to index and serve HTML files. Powered by Datasette.β109Updated 3 years ago
- DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Archβ¦β19Updated last year
- A framework for quick web archiving; canonical repository: https://gitea.arpa.li/JustAnotherArchivist/qwarcβ30Updated 4 years ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.β52Updated this week
- wabac.js - Web Archive Browsing Augmentation Clientβ115Updated last week
- Decentralized web archivingβ20Updated 7 years ago
- A Memento Aggregator CLI and Server in Goβ71Updated 8 months ago
- Convert HTTP Archive (HAR) -> Web Archive (WARC) formatβ54Updated 7 years ago
- Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.β130Updated 3 months ago
- π‘οΈπ§ Protect e-mails against spam and scraping botsβ33Updated 10 months ago
- A list of things related to software, literature, and other content for π£ Mementoβ102Updated last year
- Awesome links related to RSS, ATOM, and Syndication formats.β60Updated last year
- π¨ High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.β180Updated 2 months ago
- Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supportβ¦β112Updated 2 years ago
- A self-hosted bookmark database with full-text page content searchβ96Updated 6 months ago
- Collection of Python code to re-use across Python-based scrapersβ24Updated 2 weeks ago
- The Toolkit API, app, and browser extension. Start preserving now.β47Updated last week
- https://mimesniff.spec.whatwg.org/ implementation for Pythonβ13Updated last year
- A curated list of well-known URIs, resources, guides and tools (RFC 5785)β82Updated last year
- Tools and libraries for interacting with the Netograph APIβ47Updated 2 years ago
- A dockerized, queued high fidelity web archiver based on Squidwarcβ61Updated last year
- Command line tool to convert a file in the WARC format to a file in the ZIM formatβ75Updated 8 months ago
- Coldbrew is Python compiled into JavaScript using Emscripten.β31Updated 2 years ago
- Chrome extension that uses Memento to indicate that a page a user is viewing on the live web has an archived copy and to give the user acβ¦β55Updated 3 months ago
- Homebrew formula for the ArchiveBox self-hosted internet archiving solution.β28Updated last year
- Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more β¦β356Updated this week
- NPM package and CLI tool for saving web page as single HTML fileβ48Updated last week
- The ArchiveWeb.page Siteβ30Updated 3 weeks ago
- Run a Personal VPN with global exit nodes and proxy via Tailscale IPNβ45Updated 8 months ago