zytedata / web-snap
Create "perfect" snapshots of web pages
☆32Updated 4 months ago
Alternatives and similar repositories for web-snap:
Users that are interested in web-snap are comparing it to the libraries listed below
- A Cloudflare Worker to render embeds on a single page using oEmbed☆19Updated 2 years ago
- DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Arch…☆19Updated last year
- 🧩 Proposal to allow user scripts like "expand comments", "hide popups", "fill out this form", etc. to be reusable across pure browser en…☆17Updated last month
- Web archive index server based on RocksDB☆34Updated 5 months ago
- Docker Compose based system for running remote browsers (including Flash and Java support) connected to web archives☆15Updated 3 years ago
- Experimental proxy and wrapper for safely embedding Web Archives (warc, warc.gz, wacz) into web pages.☆31Updated last month
- A framework for quick web archiving; canonical repository: https://gitea.arpa.li/JustAnotherArchivist/qwarc☆27Updated 3 years ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆40Updated this week
- Command line tool for digging into WARC files☆39Updated 3 weeks ago
- A tool for detecting viruses and NSFW material in WARC files☆14Updated 8 months ago
- CDXJ Indexing of WARC/ARCs☆25Updated 4 months ago
- ☆16Updated 3 weeks ago
- ☆10Updated 3 years ago
- Official ArchiveBox MITM proxy: saves URLs of all requests passing through to an ArchiveBox server for archival.☆21Updated 9 months ago
- Decentralized web archiving☆20Updated 6 years ago
- Coldbrew is Python compiled into JavaScript using Emscripten.☆31Updated 2 years ago
- Digital Preservation of HTTP in documentary heritage.☆22Updated last year
- Static Site Generator for Viewing Web Archives (in WACZ) format☆25Updated last year
- A helper library full of URL-related heuristics.☆69Updated last month
- Support for writing WARC files with Scrapy☆21Updated 5 years ago
- Official Python package for ArchiveBox, the self-hosted internet archiving solution.☆13Updated 6 months ago
- Awesome list dedicated to digital and data preservation tools, sources, services and so on.☆25Updated 2 years ago
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆26Updated 8 months ago
- 🛡️📧 Protect e-mails against spam and scraping bots☆33Updated 3 months ago
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆53Updated 8 months ago
- A dockerized, queued high fidelity web archiver based on Squidwarc☆58Updated 9 months ago
- wabac.js - Web Archive Browsing Augmentation Client☆107Updated 2 weeks ago
- Converts HTTrack crawls to WARC files☆32Updated 8 months ago
- Lightweight JavaScript library to interact with Chromium-based browsers via the Chrome DevTools Protocol☆18Updated 11 months ago
- simple script to convert web resources to a single warc file☆21Updated last year