scrapy-plugins/scrapy-pagestorage

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/scrapy-plugins/scrapy-pagestorage)

scrapy-plugins / scrapy-pagestorage

A scrapy extension to store requests and responses information in storage service

☆27

Alternatives and similar repositories for scrapy-pagestorage

Users that are interested in scrapy-pagestorage are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

scrapy-plugins / scrapy-dotpersistence
View on GitHub
A scrapy extension to sync `.scrapy` folder to an S3 bucket
☆18Mar 28, 2022Updated 4 years ago
scrapy-plugins / scrapy-magicfields
View on GitHub
Scrapy middleware to add extra fields to items, like timestamp, response fields, spider attributes etc.
☆56Mar 16, 2022Updated 4 years ago
scrapy-plugins / scrapy-querycleaner
View on GitHub
Scrapy spider middleware to clean up query parameters in request URLs
☆24Jun 30, 2016Updated 10 years ago
scrapy-plugins / scrapy-streaming
View on GitHub
☆19Oct 12, 2016Updated 9 years ago
scrapy-plugins / scrapy-jsonschema
View on GitHub
Scrapy schema validation pipeline and Item builder using JSON Schema
☆45Mar 26, 2021Updated 5 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
scrapy-plugins / scrapy-monkeylearn
View on GitHub
A Scrapy pipeline to categorize items using MonkeyLearn
☆38Apr 28, 2017Updated 9 years ago
scrapy-plugins / scrapy-deltafetch
View on GitHub
Scrapy spider middleware to ignore requests to pages containing items seen in previous crawls
☆276Feb 26, 2025Updated last year
scrapy / scrapy-bench
View on GitHub
A CLI for benchmarking Scrapy.
☆32Jun 28, 2025Updated last year
redapple / parslepy
View on GitHub
Python implementation of the Parsley language for extracting structured data from web pages
☆92Oct 26, 2017Updated 8 years ago
stav / scrapybox
View on GitHub
Scrapy GUI
☆12Feb 26, 2021Updated 5 years ago
TeamHG-Memex / domain-discovery-crawler
View on GitHub
Broad crawler for domain discovery
☆20Apr 8, 2026Updated 3 months ago
TeamHG-Memex / scrapy-crawl-once
View on GitHub
Scrapy middleware which allows to crawl only new content
☆80Apr 8, 2026Updated 3 months ago
istresearch / traptor
View on GitHub
Traptor -- A distributed Twitter feed
☆26Sep 12, 2022Updated 3 years ago
Tiago-Lira / scrapyd-mongodb
View on GitHub
Library designed to replace the SQLite backend by a MongoDB backend on Scrapy queue management
☆17Sep 2, 2017Updated 8 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
bigyak / wild-yak
View on GitHub
The Yak
☆16May 11, 2018Updated 8 years ago
scrapinghub / webpager
View on GitHub
Paginating the web
☆37Feb 11, 2014Updated 12 years ago
scrapinghub / scrapy-mosquitera
View on GitHub
Restrict crawl and scraping scope using matchers.
☆26Jun 8, 2016Updated 10 years ago
olist / hulks
View on GitHub
Olist custom linting hooks
☆26Aug 2, 2023Updated 2 years ago
jondot / pgpipeline
View on GitHub
A Scrapy pipeline module to persist items to a postgres table automatically.
☆21Aug 14, 2017Updated 8 years ago
TeamHG-Memex / sitehound-frontend
View on GitHub
Site Hound (previously THH) is a Domain Discovery Tool
☆24Apr 8, 2026Updated 3 months ago
TeamHG-Memex / extract-html-diff
View on GitHub
extract difference between two html pages
☆33Apr 8, 2026Updated 3 months ago
Wiredcraft / foundation
View on GitHub
A simple Metalsmith website boilerplate (Gulp included)
☆13Aug 21, 2015Updated 10 years ago
llonchj / scrapy-sentry
View on GitHub
Sentry component for Scrapy
☆84Aug 21, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
umputun / mongo-auth
View on GitHub
mongo docker with auth
☆12Jul 24, 2018Updated 8 years ago
eliasdorneles / drawingapp-voc
View on GitHub
Drawing App for Android written in Python, powered by BeeWare suite - https://pybee.org
☆20Oct 17, 2017Updated 8 years ago
alexkorol / repo2GPT
View on GitHub
Clone any GitHub repo and flatten it into a file-tree diagram plus one consolidated code file - quick repo context for LLMs
☆10Nov 21, 2025Updated 8 months ago
scrapinghub / webstruct
View on GitHub
NER toolkit for HTML data
☆259May 3, 2024Updated 2 years ago
patarapolw / pyhandsontable
View on GitHub
View a list of JSON-serializable dictionaries or a 2-D array, in HandsOnTable, in Jupyter Notebook.
☆13Oct 11, 2018Updated 7 years ago
alecxe / scrapy-fake-useragent
View on GitHub
Random User-Agent middleware based on fake-useragent
☆688Sep 18, 2023Updated 2 years ago
valnub / framework7-simplest-template
View on GitHub
The simplest Framework7 template possible
☆18Mar 22, 2023Updated 3 years ago
ramonsaraiva / django-expiry
View on GitHub
Expiry rules for Django sessions
☆22Jul 17, 2020Updated 6 years ago
Granitosaurus / parsel-cli
View on GitHub
cli for evaluating css and xpath selectors
☆29Jul 4, 2023Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
AccordBox / awesome-scrapy
View on GitHub
A curated list of awesome packages, articles, and other cool resources from the Scrapy community.
☆561Dec 28, 2022Updated 3 years ago
Zyncco / Chrome-Extension
View on GitHub
☆11Oct 5, 2017Updated 8 years ago
tony / awesome-tmux-configs
View on GitHub
Add your configs for tmux
☆18Apr 3, 2022Updated 4 years ago
LiuXingMing / cnn_on_captcha
View on GitHub
验证码CNN识别（学库宝）
☆16May 30, 2018Updated 8 years ago
philoL / csc120-summer-2019-assignments
View on GitHub
☆10Aug 2, 2019Updated 6 years ago
johndatserakis / chrome-ribbon-reminder
View on GitHub
🎀 A Chrome extension written using Vue and Async/Await. Uses a popup display and changes badge counts.
☆14Oct 28, 2024Updated last year
Suor / flaws
View on GitHub
Finds flaws in your python code
☆40Nov 23, 2017Updated 8 years ago