zytedata / web-snap
Create "perfect" snapshots of web pages
☆32Updated 2 months ago
Alternatives and similar repositories for web-snap:
Users that are interested in web-snap are comparing it to the libraries listed below
- DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Arch…☆18Updated last year
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆39Updated last month
- Official Python package for ArchiveBox, the self-hosted internet archiving solution.☆13Updated 4 months ago
- A Cloudflare Worker to render embeds on a single page using oEmbed☆19Updated 2 years ago
- ArchiveBoxMatic: configure ArchiveBox with the simplicity of a yaml file.☆14Updated 3 years ago
- A Bot that searches for and posts links to archived versions of articles after scanning all of HackerNews' top articles for those that co…☆22Updated 2 years ago
- A helper library full of URL-related heuristics.☆64Updated 4 months ago
- Scrapy rotation proxy package with advanced functions☆94Updated 2 years ago
- Limier est un petit outil en CLI permettant de trouver un flux RSS quand il est planqué sur un site.☆19Updated last year
- Search & Browse xkcd by topic & keywords | Using Typesense, an open source Algolia alternative and an easier-to-use alternative to Elasti…☆19Updated 3 weeks ago
- Official ArchiveBox MITM proxy: saves URLs of all requests passing through to an ArchiveBox server for archival.☆21Updated 7 months ago
- This is the HeadQuarters of my digital info. HPI library got me inspired and I'm trying to play with the idea on a smaller scale for myse…☆20Updated last year
- Proxies third-party PDF files and HTML pages with the Hypothesis client embedded, so you can annotate them☆21Updated this week
- Tool to index and serve HTML files. Powered by Datasette.☆95Updated 2 years ago
- Host-free RSS reader in your browser.☆15Updated last year
- A single tab web browser built with puppeteer. Also, no client-side JS. Viewport is streamed with MJPEG. For realz.☆56Updated last year
- Real-time insights into the news you read☆29Updated 2 years ago
- Clean a series of links, resolving redirects and finding Wayback results if page is gone. Originally written to aid with importing from A…☆16Updated 4 months ago
- Homebrew formula for the ArchiveBox self-hosted internet archiving solution.☆28Updated 4 months ago
- Coldbrew is Python compiled into JavaScript using Emscripten.☆30Updated 2 years ago
- Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page…☆39Updated 5 months ago
- Awesome links related to RSS, ATOM, and Syndication formats.☆53Updated 6 months ago
- Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Support…☆110Updated last year
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆52Updated 6 months ago
- 🌱 goClone - clone websites in seconds☆66Updated this week
- Benson turns a list of URLs into mp3s of the contents of each web page - take control over your reading backlog!☆14Updated 3 months ago
- A debian:buster-slim full-text-rss Docker Container☆13Updated 4 years ago
- 🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.☆148Updated 2 weeks ago
- simple script to convert web resources to a single warc file☆20Updated last year
- A code editing & sharing utility☆12Updated last year