scrapinghub/webpager

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/scrapinghub/webpager)

scrapinghub / webpager

Paginating the web

☆37

Alternatives and similar repositories for webpager

Users that are interested in webpager are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

scrapinghub / webstruct
View on GitHub
NER toolkit for HTML data
☆259May 3, 2024Updated 2 years ago
pydepta / pydepta
View on GitHub
A python implementation of DEPTA
☆84Jan 14, 2017Updated 9 years ago
TeamHG-Memex / extract-html-diff
View on GitHub
extract difference between two html pages
☆33Apr 8, 2026Updated 3 months ago
TeamHG-Memex / Formasaurus
View on GitHub
Formasaurus tells you the type of an HTML form and its fields using machine learning
☆121Apr 8, 2026Updated 3 months ago
scrapinghub / autopager
View on GitHub
Detect and classify pagination links
☆15Sep 9, 2020Updated 5 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
rmax / scrapy-boilerplate
View on GitHub
Small set of utilities to simplify writing Scrapy spiders.
☆50Jul 24, 2015Updated 11 years ago
scrapinghub / aile
View on GitHub
Automatic Item List Extraction
☆85Jun 15, 2016Updated 10 years ago
scrapinghub / scaws
View on GitHub
Extensions for using Scrapy on Amazon AWS
☆32Dec 5, 2012Updated 13 years ago
scrapinghub / skinfer
View on GitHub
Skinfer is a tool for inferring and merging JSON schemas
☆141Apr 24, 2024Updated 2 years ago
scrapinghub / scrapy-mosquitera
View on GitHub
Restrict crawl and scraping scope using matchers.
☆26Jun 8, 2016Updated 10 years ago
zytedata / flattering
View on GitHub
Flatten, format, and export any JSON-like data to CSV (or any other string output).
☆17Sep 13, 2021Updated 4 years ago
xtannier / WebAnnotator
View on GitHub
WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…
☆48Dec 17, 2021Updated 4 years ago
TeamHG-Memex / soft404
View on GitHub
A classifier for detecting soft 404 pages
☆65Apr 8, 2026Updated 3 months ago
scrapinghub / page_clustering
View on GitHub
A simple algorithm for clustering web pages, suitable for crawlers
☆33Mar 6, 2017Updated 9 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
scrapinghub / adblockparser
View on GitHub
Python parser for Adblock Plus filters
☆202Feb 20, 2019Updated 7 years ago
xfifix / SEO_REPO
View on GitHub
Keywords enrichment by autocompletion (AWS, PM, RDC, CDS, ...), google suggestion scraping Heavy multithreaded semantic corpus crawler S…
☆12May 22, 2015Updated 11 years ago
richardcornish / django-registrationwall
View on GitHub
A Django mixin to raise a metered registration wall
☆10Aug 30, 2017Updated 8 years ago
scrapy-plugins / scrapy-pagestorage
View on GitHub
A scrapy extension to store requests and responses information in storage service
☆27Mar 11, 2022Updated 4 years ago
scrapy-plugins / scrapy-querycleaner
View on GitHub
Scrapy spider middleware to clean up query parameters in request URLs
☆24Jun 30, 2016Updated 10 years ago
bigyak / wild-yak
View on GitHub
The Yak
☆16May 11, 2018Updated 8 years ago
scrapy / scrapely
View on GitHub
A pure-python HTML screen-scraping library
☆1,884Apr 4, 2022Updated 4 years ago
scrapinghub / scmongo
View on GitHub
MongoDB extensions for Scrapy
☆44Oct 2, 2014Updated 11 years ago
scrapinghub / page_finder
View on GitHub
Find which links on a web page are pagination links
☆29Jan 12, 2017Updated 9 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
otto-de-legacy / drop-in.js
View on GitHub
A good start to do Real User Monitoring (RUM) in your project with this simple drop-in js file.
☆11Oct 7, 2013Updated 12 years ago
TeamHG-Memex / sitehound-frontend
View on GitHub
Site Hound (previously THH) is a Domain Discovery Tool
☆24Apr 8, 2026Updated 3 months ago
drvinceknight / EdinburghFringeJokes
View on GitHub
A repo for a blog post looking at the Edinburgh Fringe Festival jokes
☆17Apr 11, 2021Updated 5 years ago
rmax / scrapy-inline-requests
View on GitHub
A decorator to write coroutine-like spider callbacks.
☆109Dec 26, 2022Updated 3 years ago
jointakahe / taktivitypub
View on GitHub
A Python library for parsing and creating ActivityPub messages
☆22Mar 13, 2024Updated 2 years ago
ahmadassaf / KBE
View on GitHub
Node.js application to extract the knowledge represented in Google infoboxes (aka Google Knowlege Graph Panel)
☆26Feb 28, 2017Updated 9 years ago
datalib / StatsCounter
View on GitHub
Python's missing statistical Swiss Army knife
☆15Aug 25, 2015Updated 10 years ago
w3c-ccg / vp-request-spec
View on GitHub
Specification for a query language to request Verifiable Presentations from wallets etc.
☆10Apr 23, 2026Updated 3 months ago
TeamHG-Memex / MaybeDont
View on GitHub
A component that tries to avoid downloading duplicate content
☆28Apr 8, 2026Updated 3 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
jvanz / libwarc
View on GitHub
C++ library to parse WARC files
☆11Jan 27, 2019Updated 7 years ago
scrapinghub / flatson
View on GitHub
Tool to flatten stream of JSON-like objects, configured via schema
☆33Oct 19, 2019Updated 6 years ago
ssteuteville / scrapyz
View on GitHub
"Scrape Easy" - an extension of the Scrapy framework.
☆185Aug 13, 2016Updated 9 years ago
seagatesoft / webdext
View on GitHub
Intelligent Web Data Extractor
☆74Dec 5, 2022Updated 3 years ago
mokeyish / QuickGraph
View on GitHub
☆14May 10, 2020Updated 6 years ago
benbalter / naughty_or_nice
View on GitHub
You've made the list, we'll help you check it twice. Given a domain-like string, verifies inclusion in a list you provide.
☆19Nov 13, 2020Updated 5 years ago
TeamHG-Memex / aquarium
View on GitHub
Splash + HAProxy + Docker Compose
☆195Apr 8, 2026Updated 3 months ago