rmax/scrapydo

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/rmax/scrapydo)

rmax / scrapydo

Crochet-based blocking API for Scrapy.

☆47

Alternatives and similar repositories for scrapydo

Users that are interested in scrapydo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

rmax / databrewer
View on GitHub
The missing datasets manager. Like hombrew but for datasets. CLI-tool for search and discover datasets!
☆41May 29, 2017Updated 9 years ago
TeamHG-Memex / url-summary
View on GitHub
Show summary of a large number of URLs in a Jupyter Notebook
☆19Apr 8, 2026Updated 3 months ago
stummjr / scrapy-fieldstats
View on GitHub
A Scrapy extension to log items coverage when the spider shuts down
☆18Apr 11, 2020Updated 6 years ago
zytedata / spidyquotes
View on GitHub
Example site for web scraping tutorials
☆31Oct 9, 2024Updated last year
scrapinghub / page_finder
View on GitHub
Find which links on a web page are pagination links
☆29Jan 12, 2017Updated 9 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
llonchj / scrapy-sentry
View on GitHub
Sentry component for Scrapy
☆84Aug 21, 2023Updated 2 years ago
redapple / parslepy
View on GitHub
Python implementation of the Parsley language for extracting structured data from web pages
☆92Oct 26, 2017Updated 8 years ago
itamarst / crochet
View on GitHub
Crochet: use Twisted anywhere!
☆238Sep 3, 2024Updated last year
ArturGaspar / scrapy-qtwebkit
View on GitHub
☆13Dec 4, 2019Updated 6 years ago
zytedata / html-text
View on GitHub
☆20Oct 6, 2025Updated 9 months ago
inspirehep / hepcrawl
View on GitHub
Scrapy project for feeds into INSPIRE-HEP
☆20Jun 22, 2026Updated last month
scrapinghub / autopager
View on GitHub
Detect and classify pagination links
☆15Sep 9, 2020Updated 5 years ago
scrapinghub / aile
View on GitHub
Automatic Item List Extraction
☆85Jun 15, 2016Updated 10 years ago
TeamHG-Memex / arachnado
View on GitHub
Web Crawling UI and HTTP API, based on Scrapy and Tornado
☆162Apr 8, 2026Updated 3 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
TeamHG-Memex / MaybeDont
View on GitHub
A component that tries to avoid downloading duplicate content
☆28Apr 8, 2026Updated 3 months ago
TeamHG-Memex / extract-html-diff
View on GitHub
extract difference between two html pages
☆33Apr 8, 2026Updated 3 months ago
kserhii / money-parser
View on GitHub
Price and currency parsing utility
☆27Mar 6, 2023Updated 3 years ago
TeamHG-Memex / scrapy-dockerhub
View on GitHub
[UNMAINTAINED] Deploy, run and monitor your Scrapy spiders.
☆12Apr 8, 2026Updated 3 months ago
ShinyTrinkets / twofold.ts
View on GitHub
TwoFold (2✂︎f). Text files breathe fire.
☆23Jan 28, 2026Updated 5 months ago
scrapinghub / scrapyrt
View on GitHub
HTTP API for Scrapy spiders
☆882Jun 29, 2026Updated 3 weeks ago
scrapy-plugins / scrapy-monkeylearn
View on GitHub
A Scrapy pipeline to categorize items using MonkeyLearn
☆38Apr 28, 2017Updated 9 years ago
stav / scrapybox
View on GitHub
Scrapy GUI
☆12Feb 26, 2021Updated 5 years ago
scrapinghub / product-extraction-benchmark
View on GitHub
☆16Apr 10, 2026Updated 3 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
alertot / detectem
View on GitHub
detectem - detect software and its version on websites.
☆157Mar 25, 2021Updated 5 years ago
alecxe / scrapy-fake-useragent
View on GitHub
Random User-Agent middleware based on fake-useragent
☆688Sep 18, 2023Updated 2 years ago
scrapy / scrapy-lint
View on GitHub
A linter for Scrapy projects.
☆22Jul 7, 2026Updated 2 weeks ago
seagatesoft / webdext
View on GitHub
Intelligent Web Data Extractor
☆74Dec 5, 2022Updated 3 years ago
TeamHG-Memex / soft404
View on GitHub
A classifier for detecting soft 404 pages
☆65Apr 8, 2026Updated 3 months ago
elacuesta / scrapy-pyppeteer
View on GitHub
Pyppeteer integration for Scrapy
☆58Feb 26, 2021Updated 5 years ago
scrapinghub / scrapy-poet
View on GitHub
Page Object pattern for Scrapy
☆127Jun 8, 2026Updated last month
rafaelcapucho / scrapy-eagle
View on GitHub
Scrapy Eagle is a tool that allow us to run any Scrapy based project in a distributed fashion and monitor how it is going on and how many…
☆24Sep 4, 2020Updated 5 years ago
commonsearch / gumbocy
View on GitHub
Python binding for gumbo-parser using Cython
☆14Aug 16, 2016Updated 9 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
TeamHG-Memex / undercrawler
View on GitHub
A generic crawler
☆81Apr 8, 2026Updated 3 months ago
SimonSapin / html5ever-python
View on GitHub
Python bindings for html5ever, using CFFI
☆39Nov 9, 2017Updated 8 years ago
scrapinghub / kafka-scanner
View on GitHub
High Level Kafka Scanner
☆19Sep 29, 2017Updated 8 years ago
scrapinghub / skinfer
View on GitHub
Skinfer is a tool for inferring and merging JSON schemas
☆141Apr 24, 2024Updated 2 years ago
mrf345 / flask_datepicker
View on GitHub
A Flask extension for Jquery-ui javascript date picker
☆16Jul 22, 2024Updated last year
scrapinghub / js2xml
View on GitHub
Convert Javascript code to an XML document
☆188Mar 14, 2022Updated 4 years ago
commonsearch / urlparse4
View on GitHub
Faster replacement for Python's urlparse module
☆46Apr 13, 2026Updated 3 months ago