zytedata / web-snap
Create "perfect" snapshots of web pages
☆32Updated 3 months ago
Alternatives and similar repositories for web-snap:
Users that are interested in web-snap are comparing it to the libraries listed below
- Lightweight JavaScript library to interact with Chromium-based browsers via the Chrome DevTools Protocol☆16Updated 10 months ago
- Official Python package for ArchiveBox, the self-hosted internet archiving solution.☆13Updated 6 months ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆39Updated this week
- Official ArchiveBox MITM proxy: saves URLs of all requests passing through to an ArchiveBox server for archival.☆21Updated 8 months ago
- DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Arch…☆19Updated last year
- Homebrew formula for the ArchiveBox self-hosted internet archiving solution.☆28Updated 6 months ago
- Spider templates for automatic crawlers.☆28Updated last week
- Benson turns a list of URLs into mp3s of the contents of each web page - take control over your reading backlog!☆14Updated 5 months ago
- ArchiveBoxMatic: configure ArchiveBox with the simplicity of a yaml file.☆14Updated 4 years ago
- Extract all internal and external links from a URL in Python.☆13Updated last year
- This is the HeadQuarters of my digital info. HPI library got me inspired and I'm trying to play with the idea on a smaller scale for myse…☆21Updated last year
- A code editing & sharing utility☆12Updated last year
- 🛡️📧 Protect e-mails against spam and scraping bots☆32Updated 2 months ago
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆53Updated 7 months ago
- Coldbrew is Python compiled into JavaScript using Emscripten.☆30Updated 2 years ago
- Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Support…☆111Updated 2 years ago
- linkbak is a web page archiver : it reads a list of links and dumps the corresponding pages in HTML and PDF.☆14Updated 2 years ago
- https://mimesniff.spec.whatwg.org/ implementation for Python☆13Updated last year
- YaBSON is a library allowing schemaless binary-encoded parsing/serialization of JavaScript data with a generator-based implementation☆14Updated last year
- 🐍A curated list of awesome python environment.☆13Updated 4 years ago
- A case management app built with Lowdefy.☆32Updated last year
- Real-time insights into the news you read☆29Updated 2 years ago
- Tabserve Issue Tracker☆11Updated last year
- Collection of manifest files for 100k Chrome extensions☆76Updated 2 months ago
- Datasette plugin for uploading CSV files and converting them to database tables☆26Updated 11 months ago
- ☆11Updated 4 months ago
- Host-free RSS reader in your browser.☆15Updated last year
- Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page…☆40Updated 6 months ago
- Converts HTTrack crawls to WARC files☆32Updated 8 months ago
- Awesome list dedicated to digital and data preservation tools, sources, services and so on.☆25Updated 2 years ago