zytedata / web-snapLinks
Create "perfect" snapshots of web pages
☆33Updated last week
Alternatives and similar repositories for web-snap
Users that are interested in web-snap are comparing it to the libraries listed below
Sorting:
- A helper library full of URL-related heuristics.☆73Updated 3 months ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆54Updated last month
- A framework for quick web archiving; canonical repository: https://gitea.arpa.li/JustAnotherArchivist/qwarc☆30Updated 4 years ago
- DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Arch…☆19Updated last year
- The little things give you away... A collection of various small helper stuff – Mirror repo only, no longer kept in sync, refer to gitea.…☆24Updated 5 years ago
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆58Updated last year
- 🧩 Proposal to allow user scripts like "expand comments", "hide popups", "fill out this form", etc. to be reusable across pure browser en…☆19Updated 6 months ago
- Decentralized web archiving☆20Updated 7 years ago
- Tool to index and serve HTML files. Powered by Datasette.☆111Updated 3 years ago
- Lightweight JavaScript library to interact with Chromium-based browsers via the Chrome DevTools Protocol☆27Updated last year
- A curated list of well-known URIs, resources, guides and tools (RFC 5785)☆85Updated last year
- 🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.☆186Updated 4 months ago
- Official Python package for ArchiveBox, the self-hosted internet archiving solution.☆12Updated last year
- 🛡️📧 Protect e-mails against spam and scraping bots☆35Updated 11 months ago
- A self-hosted bookmark database with full-text page content search☆96Updated 7 months ago
- Create a static website with Fly - HTML from the example☆21Updated last year
- Spider templates for automatic crawlers.☆33Updated last month
- Downloads websites for long-term archival.☆81Updated this week
- Coldbrew is Python compiled into JavaScript using Emscripten.☆31Updated 3 years ago
- Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)☆168Updated 4 months ago
- A collection of tools for archiving and analysing the internet.☆78Updated 3 years ago
- A list of things related to software, literature, and other content for 🕣 Memento☆103Updated last year
- A dockerized, queued high fidelity web archiver based on Squidwarc☆61Updated last year
- List of proxy IP addresses used by bots☆95Updated this week
- Hydra: a multithreaded site-crawling link checker in Python standard library☆124Updated 6 months ago
- Your "yellow pages" of Enterprise Free Software Publishers, their products and success cases☆17Updated last year
- Official ArchiveBox MITM proxy: saves URLs of all requests passing through to an ArchiveBox server for archival.☆31Updated last year
- A Cloudflare Worker to render embeds on a single page using oEmbed☆21Updated 3 years ago
- Chrome extension that uses Memento to indicate that a page a user is viewing on the live web has an archived copy and to give the user ac…☆57Updated 4 months ago
- Awesome links related to RSS, ATOM, and Syndication formats.☆61Updated last year