scrapinghub/page_finder

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/scrapinghub/page_finder)

scrapinghub / page_finder

Find which links on a web page are pagination links

☆29

Alternatives and similar repositories for page_finder

Users that are interested in page_finder are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

raidikalu / raidikalu
View on GitHub
Listaa raideja ja silleen
☆16Nov 2, 2022Updated 3 years ago
scrapinghub / autopager
View on GitHub
Detect and classify pagination links
☆15Sep 9, 2020Updated 5 years ago
scrapinghub / flatson
View on GitHub
Tool to flatten stream of JSON-like objects, configured via schema
☆33Oct 19, 2019Updated 6 years ago
scrapinghub / aduana
View on GitHub
Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even whe…
☆54May 21, 2024Updated 2 years ago
scrapinghub / aile
View on GitHub
Automatic Item List Extraction
☆85Jun 15, 2016Updated 10 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
TeamHG-Memex / url-summary
View on GitHub
Show summary of a large number of URLs in a Jupyter Notebook
☆19Apr 8, 2026Updated 3 months ago
rmax / databrewer-recipes
View on GitHub
DataBrewer Recipes Repository.
☆21Jul 5, 2016Updated 10 years ago
rmax / scrapydo
View on GitHub
Crochet-based blocking API for Scrapy.
☆47Feb 24, 2017Updated 9 years ago
scrapinghub / exporters
View on GitHub
Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations
☆39May 21, 2024Updated 2 years ago
seomoz / mltk
View on GitHub
mltk - Moz Language Tool Kit
☆12Mar 6, 2015Updated 11 years ago
scrapinghub / webstruct
View on GitHub
NER toolkit for HTML data
☆259May 3, 2024Updated 2 years ago
xtannier / WebAnnotator
View on GitHub
WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…
☆48Dec 17, 2021Updated 4 years ago
scrapinghub / andi
View on GitHub
Library for annotation-based dependency injection
☆24Jul 21, 2026Updated last week
Parsely / serpextract
View on GitHub
Easy extraction of keywords and engines from search engine results pages (SERPs).
☆92Oct 20, 2025Updated 9 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
scrapinghub / adblockparser
View on GitHub
Python parser for Adblock Plus filters
☆202Feb 20, 2019Updated 7 years ago
pydepta / pydepta
View on GitHub
A python implementation of DEPTA
☆84Jan 14, 2017Updated 9 years ago
scrapinghub / scrapy-poet
View on GitHub
Page Object pattern for Scrapy
☆127Jun 8, 2026Updated last month
alecxe / scrapy-beautifulsoup
View on GitHub
Simple Scrapy middleware to process non-well-formed HTML with BeautifulSoup
☆22Sep 26, 2016Updated 9 years ago
redapple / parslepy
View on GitHub
Python implementation of the Parsley language for extracting structured data from web pages
☆92Oct 26, 2017Updated 8 years ago
scrapinghub / mdr
View on GitHub
A python library detect and extract listing data from HTML page.
☆110May 5, 2017Updated 9 years ago
yarko / scrape
View on GitHub
interactive web scraping
☆19Nov 7, 2014Updated 11 years ago
ContinuumIO / scrapy_scrapers
View on GitHub
Scraper built with Scrapy.
☆18Jun 25, 2026Updated last month
hoptical / grafana-skype-alerts
View on GitHub
A webhook notifier for sending Grafana alerts to Skype
☆10Jun 28, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
diffbot / wikistatsextractor
View on GitHub
Extract statistics from Wikipedia Dump files.
☆26Aug 2, 2021Updated 4 years ago
pablohoffman / awesome-web-scraping
View on GitHub
List of libraries, tools and APIs for web scraping and data processing.
☆13Sep 17, 2015Updated 10 years ago
dossier / html-highlighter
View on GitHub
Highlight and select phrases in HTML pages.
☆24Nov 4, 2019Updated 6 years ago
scrapinghub / skinfer
View on GitHub
Skinfer is a tool for inferring and merging JSON schemas
☆141Apr 24, 2024Updated 2 years ago
PomanoB / lsse
View on GitHub
Serelex - lexico-semantic search engine
☆19Mar 19, 2017Updated 9 years ago
CraveFood / django-haystack-elasticsearch
View on GitHub
☆22Dec 26, 2022Updated 3 years ago
TeamHG-Memex / html-text
View on GitHub
Extract text from HTML
☆135Apr 8, 2026Updated 3 months ago
htaghizadeh / PersianStemmingDataset
View on GitHub
Persian Stemming data-set in order to evaluate new stemmers
☆14Dec 16, 2016Updated 9 years ago
Parsely / schemato
View on GitHub
Modularly extensible semantic metadata validator
☆85Dec 10, 2015Updated 10 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
CompileInc / hodor
View on GitHub
🕷Configuration based html scraper
☆23Nov 4, 2025Updated 8 months ago
NUKnightLab / InstaTimeline
View on GitHub
Collaborative Innovation Class Project
☆14Jun 12, 2015Updated 11 years ago
titu1994 / Python-Work
View on GitHub
Python scripts to facilitate easy working
☆11Mar 23, 2026Updated 4 months ago
Code4SA / mma-dexter
View on GitHub
Dexter document monitor for MMA
☆16May 8, 2024Updated 2 years ago
jupyterhub / simpervisor
View on GitHub
Simple Python3 Supervisor library
☆14Jul 1, 2026Updated 3 weeks ago
nyov / scrapyext
View on GitHub
scrapy-extras -- a collection of code samples and modules for the Scrapy framework.
☆14Dec 14, 2020Updated 5 years ago
scrapinghub / scrapylib
View on GitHub
Collection of Scrapy utilities (extensions, middlewares, pipelines, etc)
☆33Feb 22, 2018Updated 8 years ago