zytedata / web-snapLinks
Create "perfect" snapshots of web pages
☆32Updated 6 months ago
Alternatives and similar repositories for web-snap
Users that are interested in web-snap are comparing it to the libraries listed below
Sorting:
- A helper library full of URL-related heuristics.☆69Updated 2 weeks ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆42Updated last week
- The little things give you away... A collection of various small helper stuff – Mirror repo only, no longer kept in sync, refer to gitea.…☆24Updated 4 years ago
- DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Arch…☆19Updated last year
- Coldbrew is Python compiled into JavaScript using Emscripten.☆31Updated 2 years ago
- 🧩 Proposal to allow user scripts like "expand comments", "hide popups", "fill out this form", etc. to be reusable across pure browser en…☆18Updated 3 months ago
- A ServiceWorker for client-side reconstruction of composite mementos☆15Updated 3 months ago
- Spider templates for automatic crawlers.☆29Updated this week
- Official Python package for ArchiveBox, the self-hosted internet archiving solution.☆13Updated 8 months ago
- This is the HeadQuarters of my digital info. HPI library got me inspired and I'm trying to play with the idea on a smaller scale for myse…☆21Updated last year
- ☆10Updated last year
- Awesome list dedicated to digital and data preservation tools, sources, services and so on.☆25Updated 2 years ago
- Lightweight JavaScript library to interact with Chromium-based browsers via the Chrome DevTools Protocol☆20Updated last year
- Query IMDB inside your browser☆14Updated 7 months ago
- Scrape HN to track links from specific domains☆61Updated this week
- 📦 Modern strongly typed Python library for managing system dependencies with package managers like apt, brew, pip, npm, etc.☆19Updated 2 months ago
- Official ArchiveBox MITM proxy: saves URLs of all requests passing through to an ArchiveBox server for archival.☆25Updated 11 months ago
- Trough: Big data, small databases.☆42Updated 11 months ago
- A framework for quick web archiving; canonical repository: https://gitea.arpa.li/JustAnotherArchivist/qwarc☆28Updated 4 years ago
- Homebrew formula for the ArchiveBox self-hosted internet archiving solution.☆28Updated 8 months ago
- Docker Compose based system for running remote browsers (including Flash and Java support) connected to web archives☆15Updated 4 years ago
- A dockerized, queued high fidelity web archiver based on Squidwarc☆60Updated 11 months ago
- Repository to allow collaboration between Cycle Labs Cloud community in support of the community.☆9Updated 3 years ago
- Home of the official apt/deb package for Ubuntu/Debian-based systems.☆17Updated 8 months ago
- Use SQL to instantly query stories, users and other items from Hacker News. Open source CLI. No DB required.☆17Updated 2 weeks ago
- YaBSON is a library allowing schemaless binary-encoded parsing/serialization of JavaScript data with a generator-based implementation☆14Updated last year
- Chrome extension that uses Memento to indicate that a page a user is viewing on the live web has an archived copy and to give the user ac…☆54Updated this week
- Decentralized web archiving☆20Updated 6 years ago
- Webrecorder Automated In-Page Behavior Framework☆13Updated 4 years ago
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆26Updated 10 months ago