zytedata/web-snap

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zytedata/web-snap)

zytedata / web-snap

Create "perfect" snapshots of web pages

☆34

Alternatives and similar repositories for web-snap

Users that are interested in web-snap are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zytedata / zyte-spider-templates
View on GitHub
Spider templates for automatic crawlers.
☆35Mar 26, 2026Updated 3 months ago
ArchiveBox / pip-archivebox
View on GitHub
Official Python package for ArchiveBox, the self-hosted internet archiving solution.
☆12Oct 5, 2024Updated last year
internetarchive / sandcrawler
View on GitHub
Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki
☆28Jul 31, 2024Updated last year
thuyendangdinh / tutix404
View on GitHub
☆35Jun 6, 2024Updated 2 years ago
ArchiveBox / homebrew-archivebox
View on GitHub
Homebrew formula for the ArchiveBox self-hosted internet archiving solution.
☆28Jun 14, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
scrapy / pypydispatcher
View on GitHub
A fork of http://pydispatcher.sourceforge.net/ with PyPy support
☆16Jul 3, 2017Updated 9 years ago
Red-Mafia / boss404
View on GitHub
██████╗ ███████╗██████╗ ██╔══██╗██╔════╝██╔══██╗ ██████╔╝█████╗ ██║ ██║ ██╔══██╗██╔══╝ ██║ ██║ ██║ ██║███████╗██████╔╝ ╚═╝ ╚═╝╚═══…
☆12Feb 17, 2022Updated 4 years ago
zytedata / flattering
View on GitHub
Flatten, format, and export any JSON-like data to CSV (or any other string output).
☆17Sep 13, 2021Updated 4 years ago
scrapinghub / page_clustering
View on GitHub
A simple algorithm for clustering web pages, suitable for crawlers
☆33Mar 6, 2017Updated 9 years ago
Redrrx / ProxyNest
View on GitHub
Managing proxies for scaled data scraping and other automation operations will eventually require something like ProxyNest. ProxyNest is …
☆22Feb 5, 2024Updated 2 years ago
rkrzr / dataset-popular
View on GitHub
A dataset of popular pages (taken from <dir.yahoo.com>) with manually marked up semantic blocks.
☆15Feb 9, 2014Updated 12 years ago
TarekJor / bookmark-archiver
View on GitHub
🗄 Save an archived copy of websites from Pocket/Pinboard/Bookmarks/RSS. Outputs HTML, PDFs, and more...
☆38Aug 12, 2018Updated 7 years ago
aydianosec / CVE2021-40444
View on GitHub
☆10Mar 30, 2023Updated 3 years ago
ShinyTrinkets / twofold.ts
View on GitHub
TwoFold (2✂︎f). Text files breathe fire.
☆23Jan 28, 2026Updated 5 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
nla / outbackcdx
View on GitHub
Web archive index server based on RocksDB
☆43Jul 9, 2026Updated 2 weeks ago
lopuhin / kaggle-rsna-2019
View on GitHub
https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/
☆19Oct 20, 2019Updated 6 years ago
microlinkhq / is-antibot
View on GitHub
Detect anti-bot protection from 20+ providers — CloudFlare, Akamai, DataDome, PerimeterX, Kasada, Imperva, reCAPTCHA, hCaptcha, Turnstile…
☆40Jul 14, 2026Updated last week
zytedata / zyte-autoextract
View on GitHub
Python clients for Zyte AutoExtract API
☆41Jan 17, 2022Updated 4 years ago
zytedata / html-text
View on GitHub
☆20Oct 6, 2025Updated 9 months ago
jshimko / default-backend
View on GitHub
A default backend (404 page) for nginx-ingress in Kubernetes
☆15Jan 23, 2018Updated 8 years ago
scrapinghub / web-poet
View on GitHub
Web scraping Page Objects core library
☆107Jul 10, 2026Updated 2 weeks ago
tiefling-cat / ru-syntax
View on GitHub
Repository for ru-syntax command line tool.
☆15Mar 8, 2022Updated 4 years ago
zonemaster / zonemaster-backend
View on GitHub
The Zonemaster Backend - part of the Zonemaster project
☆17Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
sixpacksecurity / CVE-2021-40438
View on GitHub
CVE-2021-40438 exploit PoC with Docker setup.
☆14Oct 24, 2021Updated 4 years ago
rfcxv / CVE-2021-40444-POC
View on GitHub
☆19Sep 9, 2021Updated 4 years ago
TerminalFi / Sublime-Text-API-Tracker
View on GitHub
Sublime Text API Version Documenter
☆11Jan 3, 2023Updated 3 years ago
daijro / geoip-all-in-one
View on GitHub
Merges IP2Location + GeoLite2 + DB-IP and more into a single, highly accurate mmdb. Rebuilt weekly
☆35Updated this week
chunkeey / FritzBox-4040-UBOOT
View on GitHub
u-boot addon image for the AVM FritzBox 4040
☆15Mar 8, 2026Updated 4 months ago
ArchiveBox / debian-archivebox
View on GitHub
Home of the official apt/deb package for Ubuntu/Debian-based systems.
☆17Updated this week
Galxe / docs
View on GitHub
Get started with building on Galxe protocols and applications.
☆16May 29, 2026Updated last month
ArchiveBox / DigestBox
View on GitHub
DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Arch…
☆21Feb 2, 2024Updated 2 years ago
JoyceQuinn / Best-Advertising-And-Marketing-Management
View on GitHub
As a digital marketing company, we have actually driven numerous web site hits, millions of leads as well as also millions of sales. So i…
☆12Nov 9, 2020Updated 5 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
alextrevisan / PS1FixedPoint
View on GitHub
Fixed Point Math in C++ for Playstation 1
☆13Aug 21, 2023Updated 2 years ago
mherrmann / django-404-middleware
View on GitHub
An alternative to Django's BrokenLinkEmailsMiddleware
☆19Mar 25, 2020Updated 6 years ago
ozergoker / CVE-2021-40444
View on GitHub
Microsoft MSHTML Remote Code Execution Vulnerability CVE-2021-40444
☆18Sep 29, 2021Updated 4 years ago
LinnielDW / UnityDeviceUniqueIdentifierHarness
View on GitHub
A simple applet that launches a unity game instance to copy your unique system identifier.
☆13Oct 10, 2020Updated 5 years ago
BrandonXLF / wikipedia-user-scripts
View on GitHub
My collection of scripts that can be used on MediaWiki sites such as Wikipedia.
☆20Jul 6, 2026Updated 2 weeks ago
shellbear / dokku-go-example
View on GitHub
Easily deploy your Go applications with Dokku.
☆11Jul 22, 2025Updated last year
TeamHG-Memex / html-text
View on GitHub
Extract text from HTML
☆135Apr 8, 2026Updated 3 months ago