scrapy/dirbot

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/scrapy/dirbot)

scrapy / dirbot

Scrapy project to scrape public web directories (educational) [DEPRECATED]

☆1,627

Alternatives and similar repositories for dirbot

Users that are interested in dirbot are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

geekan / scrapy-examples
View on GitHub
Multifarious Scrapy examples. Spiders for alexa / amazon / douban / douyu / github / linkedin etc.
☆3,254Nov 3, 2023Updated 2 years ago
scrapy / quotesbot
View on GitHub
This is a sample Scrapy project for educational purposes
☆1,363Nov 29, 2023Updated 2 years ago
marchtea / scrapy_doc_chs
View on GitHub
scrapy中文翻译文档
☆1,104Sep 12, 2019Updated 6 years ago
gnemoug / distribute_crawler
View on GitHub
使用scrapy,redis, mongodb,graphite实现的一个分布式网络爬虫,底层存储mongodb集群,分布式使用redis实现,爬虫状态显示使用graphite实现
☆3,243Apr 18, 2017Updated 9 years ago
scrapy / scrapy
View on GitHub
Scrapy, a fast high-level web crawling & scraping framework for Python.
☆63,273Updated this week
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
rmax / scrapy-redis
View on GitHub
Redis-based components for Scrapy.
☆5,645May 19, 2026Updated 2 months ago
binux / pyspider
View on GitHub
A Powerful Spider(Web Crawler) System in Python.
☆16,797Apr 30, 2024Updated 2 years ago
zhijunio / scrapy-zhihu-github
View on GitHub
scrapy examples for crawling zhihu and github
☆221Jan 11, 2023Updated 3 years ago
maxliaops / scrapy-itzhaopin
View on GitHub
☆94Apr 28, 2014Updated 12 years ago
scrapy-plugins / scrapy-jsonrpc
View on GitHub
Scrapy extension to control spiders using JSON-RPC
☆299Aug 26, 2019Updated 6 years ago
yoyzhou / weibo_scrapy
View on GitHub
WEIBO_SCRAPY is a Multi-Threading SINA WEIBO data extraction Framework in Python.
☆155Jun 3, 2026Updated last month
Andrew-liu / scrapy_example
View on GitHub
This repository store some example to learn scrapy better
☆176Oct 9, 2020Updated 5 years ago
scrapinghub / portia
View on GitHub
Visual scraping for Scrapy
☆9,506Jun 26, 2024Updated 2 years ago
wuchong / scrapy-dynamic-configurable
View on GitHub
A dynamic configurable news crawler based Scrapy
☆164Jul 24, 2017Updated 8 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
jackgitgz / CnblogsSpider
View on GitHub
用scrapy采集cnblogs列表页爬虫
☆274Jun 16, 2015Updated 11 years ago
scrapy / scrapyd
View on GitHub
A service daemon to run Scrapy spiders
☆3,097Updated this week
egrcc / zhihu-python
View on GitHub
获取知乎内容信息，包括问题，答案，用户，收藏夹信息
☆2,332Feb 8, 2022Updated 4 years ago
scrapy-plugins / scrapy-splash
View on GitHub
Scrapy+Splash for JavaScript integration
☆3,229Feb 11, 2025Updated last year
qinxuye / cola
View on GitHub
A high-level distributed crawling framework.
☆1,500Jul 31, 2022Updated 3 years ago
chenqx / spiderDemo
View on GitHub
☆23Jan 31, 2015Updated 11 years ago
mjhea0 / Scrapy-Samples
View on GitHub
Scrapy examples crawling Craigslist
☆199Apr 20, 2016Updated 10 years ago
sebdah / scrapy-mongodb
View on GitHub
MongoDB pipeline for Scrapy. This module supports both MongoDB in standalone setups and replica sets. scrapy-mongodb will insert the item…
☆358Apr 6, 2021Updated 5 years ago
grangier / python-goose
View on GitHub
Html Content / Article Extractor, web scrapping lib in Python
☆4,100Mar 10, 2026Updated 4 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
doraemonext / wechat-python-sdk
View on GitHub
微信公众平台 Python 开发包 [DEPRECATED]
☆1,347Oct 1, 2020Updated 5 years ago
fxsjy / jieba
View on GitHub
结巴中文分词
☆35,076Aug 21, 2024Updated last year
psf / requests
View on GitHub
A simple, yet elegant, HTTP library.
☆54,139Updated this week
MorganZhang100 / zhihu-spider
View on GitHub
A web spider for zhihu.com
☆721Jan 17, 2024Updated 2 years ago
realpython / stack-spider
View on GitHub
☆167Nov 3, 2018Updated 7 years ago
tornadoweb / tornado
View on GitHub
Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed.
☆22,190Jul 8, 2026Updated last week
scrapy / loginform
View on GitHub
Fill HTML login forms automatically
☆279Apr 24, 2024Updated 2 years ago
LiuXingMing / SinaSpider
View on GitHub
新浪微博爬虫（Scrapy、Redis）
☆3,285Sep 5, 2018Updated 7 years ago
scrapinghub / scrapylib
View on GitHub
Collection of Scrapy utilities (extensions, middlewares, pipelines, etc)
☆33Feb 22, 2018Updated 8 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
pallets / flask
View on GitHub
The Python micro framework for building web applications.
☆71,989Jun 10, 2026Updated last month
taizilongxu / scrapy_jingdong
View on GitHub
用scrapy写的京东爬虫
☆453Dec 5, 2014Updated 11 years ago
kohn / HttpProxyMiddleware
View on GitHub
A middleware for scrapy. Used to change HTTP proxy from time to time.
☆323Feb 1, 2018Updated 8 years ago
aivarsk / scrapy-proxies
View on GitHub
Random proxy middleware for Scrapy
☆1,669Oct 1, 2019Updated 6 years ago
xchaoinfo / fuck-login
View on GitHub
模拟登录一些知名的网站，为了方便爬取需要登录的网站
☆5,870Jun 8, 2018Updated 8 years ago
littlecodersh / ItChat
View on GitHub
A complete and graceful API for Wechat. 微信个人号接口、微信机器人及命令行微信，三十行即可自定义个人号机器人。
☆26,477Sep 28, 2023Updated 2 years ago
scrapy / scrapyd-client
View on GitHub
Command line client for Scrapyd server
☆772Feb 27, 2026Updated 4 months ago