zytedata / web-snapLinks
Create "perfect" snapshots of web pages
☆32Updated 7 months ago
Alternatives and similar repositories for web-snap
Users that are interested in web-snap are comparing it to the libraries listed below
Sorting:
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆44Updated last week
- A helper library full of URL-related heuristics.☆70Updated 2 months ago
- DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Arch…☆19Updated last year
- Tool to index and serve HTML files. Powered by Datasette.☆104Updated 3 years ago
- 🧩 Proposal to allow user scripts like "expand comments", "hide popups", "fill out this form", etc. to be reusable across pure browser en…☆19Updated last month
- Coldbrew is Python compiled into JavaScript using Emscripten.☆31Updated 2 years ago
- 🛡️📧 Protect e-mails against spam and scraping bots☆33Updated 6 months ago
- Lightweight JavaScript library to interact with Chromium-based browsers via the Chrome DevTools Protocol☆21Updated last year
- A framework for quick web archiving; canonical repository: https://gitea.arpa.li/JustAnotherArchivist/qwarc☆29Updated 4 years ago
- Awesome links related to RSS, ATOM, and Syndication formats.☆59Updated last year
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆57Updated 11 months ago
- A Cloudflare Worker to render embeds on a single page using oEmbed☆20Updated 2 years ago
- Chrome extension that uses Memento to indicate that a page a user is viewing on the live web has an archived copy and to give the user ac…☆54Updated last month
- Convert HTTP Archive (HAR) -> Web Archive (WARC) format☆52Updated 6 years ago
- Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Support…☆114Updated 2 years ago
- Official ArchiveBox MITM proxy: saves URLs of all requests passing through to an ArchiveBox server for archival.☆29Updated last year
- Hydra: a multithreaded site-crawling link checker in Python standard library☆126Updated last month
- Decentralized web archiving☆20Updated 7 years ago
- Command line tool to convert a file in the WARC format to a file in the ZIM format☆64Updated 4 months ago
- A curated list of well-known URIs, resources, guides and tools (RFC 5785)☆75Updated last year
- A single tab web browser built with puppeteer. Also, no client-side JS. Viewport is streamed with MJPEG. For realz.☆56Updated 2 years ago
- A list of things related to software, literature, and other content for 🕣 Memento☆99Updated last year
- Pegao is a community about lists of links on topics of interest.☆13Updated 2 years ago
- https://mimesniff.spec.whatwg.org/ implementation for Python☆13Updated last year
- Export your Github activity: events, repositories, stars, etc.☆52Updated 3 weeks ago
- An api to check social media username availability on a variety of services☆32Updated 2 years ago
- Spider templates for automatic crawlers.☆30Updated last month
- NPM package and CLI tool for saving web page as single HTML file☆49Updated this week
- The Toolkit API, app, and browser extension. Start preserving now.☆47Updated last month
- CommonCrawl keyword scanner. Time for month of CC data on EC2 c5.18xlarge instance for hundreds of keywords takes about 3 hours. LLM (BER…☆15Updated 2 years ago