zytedata / web-snap
Create "perfect" snapshots of web pages
☆31Updated last month
Alternatives and similar repositories for web-snap:
Users that are interested in web-snap are comparing it to the libraries listed below
- A Cloudflare Worker to render embeds on a single page using oEmbed☆19Updated 2 years ago
- DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Arch…☆18Updated 11 months ago
- Official ArchiveBox MITM proxy: saves URLs of all requests passing through to an ArchiveBox server for archival.☆21Updated 6 months ago
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆52Updated 5 months ago
- Webrecorder Automated In-Page Behavior Framework☆13Updated 3 years ago
- Official Python package for ArchiveBox, the self-hosted internet archiving solution.☆13Updated 3 months ago
- Convert HTTP Archive (HAR) -> Web Archive (WARC) format☆50Updated 6 years ago
- Awesome list dedicated to digital and data preservation tools, sources, services and so on.☆24Updated 2 years ago
- The Toolkit API, app, and browser extension. Start preserving now.☆46Updated last month
- Command line tool to convert a file in the WARC format to a file in the ZIM format☆48Updated last week
- Hashpic creates an image from a MD5, SHA512, SHA3-512, Blake2b or SHAKE256 hash☆27Updated last week
- ☆10Updated 9 months ago
- A helper library full of URL-related heuristics.☆64Updated 3 months ago
- 🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.☆124Updated this week
- ArchiveBoxMatic: configure ArchiveBox with the simplicity of a yaml file.☆14Updated 3 years ago
- Datasette plugin for searching all searchable tables at once☆21Updated 4 months ago
- Tools for running enrichments against data stored in Datasette☆21Updated this week
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆25Updated 5 months ago
- a tool to snapshot sqlite databases you don't own☆19Updated 2 months ago
- ☆10Updated 4 years ago
- State-of-the-art web crawler 🔱☆98Updated this week
- Use SQL to instantly query stories, users and other items from Hacker News. Open source CLI. No DB required.☆17Updated 2 months ago
- a Chrome extension that removes "[company] is hiring" ads on https://news.ycombinator.com/.☆18Updated last year
- Docker Compose based system for running remote browsers (including Flash and Java support) connected to web archives☆14Updated 3 years ago
- A browser extension that can be installed by volunteers to participate in mwmbl distributed crawling.☆23Updated 5 months ago
- A framework for quick web archiving; canonical repository: https://gitea.arpa.li/JustAnotherArchivist/qwarc☆27Updated 3 years ago
- Datasette plugin for uploading CSV files and converting them to database tables☆25Updated 9 months ago
- Homebrew formula for the ArchiveBox self-hosted internet archiving solution.☆27Updated 3 months ago
- This is the HeadQuarters of my digital info. HPI library got me inspired and I'm trying to play with the idea on a smaller scale for myse…☆20Updated last year
- Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing…☆55Updated this week