gnemoug / distribute_crawlerLinks

使用scrapy,redis, mongodb,graphite实现的一个分布式网络爬虫,底层存储mongodb集群,分布式使用redis实现,爬虫状态显示使用graphite实现

☆3,257

Alternatives and similar repositories for distribute_crawler

Users that are interested in distribute_crawler are comparing it to the libraries listed below

Sorting:

LiuXingMing / SinaSpider
新浪微博爬虫（Scrapy、Redis）
☆3,282Updated 7 years ago
marchtea / scrapy_doc_chs
scrapy中文翻译文档
☆1,108Updated 6 years ago
rmax / scrapy-redis
Redis-based components for Scrapy.
☆5,639Updated last year
xianhu / PSpider
简单易用的Python爬虫框架，QQ交流群：597510560
☆1,837Updated 3 years ago
yidao620c / core-scrapy
python-scrapy demo
☆810Updated 5 years ago
geekan / scrapy-examples
Multifarious Scrapy examples. Spiders for alexa / amazon / douban / douyu / github / linkedin etc.
☆3,256Updated last year
qinxuye / cola
A high-level distributed crawling framework.
☆1,507Updated 3 years ago
awolfly9 / IPProxyTool
python ip proxy tool scrapy crawl. 抓取大量免费代理 ip，提取有效 ip 使用
☆2,002Updated 2 years ago
FullerHua / gooseeker
☆695Updated 8 years ago
egrcc / zhihu-python
获取知乎内容信息，包括问题，答案，用户，收藏夹信息
☆2,322Updated 3 years ago
kohn / HttpProxyMiddleware
A middleware for scrapy. Used to change HTTP proxy from time to time.
☆323Updated 7 years ago
qiyeboy / IPProxyPool
IPProxyPool代理池项目，提供代理ip
☆4,241Updated 7 years ago
scrapy / dirbot
Scrapy project to scrape public web directories (educational) [DEPRECATED]
☆1,630Updated 7 years ago
LiuRoy / zhihu_spider
知乎爬虫
☆1,261Updated 9 years ago
7sDream / zhihu-py3
[不再维护] 后继者 zhihu-oauth https://github.com/7sDream/zhihu-oauth 已被 DMCA，亦不再开发，仅提供代码存档：
☆1,038Updated 9 years ago
MorganZhang100 / zhihu-spider
A web spider for zhihu.com
☆726Updated last year
luyishisi / Anti-Anti-Spider
越来越多的网站具有反爬虫特性，有的用图片隐藏关键数据，有的使用反人类的验证码，建立反反爬虫的代码仓库，通过与不同特性的网站做斗争（无恶意）提高技术。（欢迎提交难以采集的网站）（因工作原因，项目暂停）
☆7,299Updated 4 years ago
ramsayleung / jd_spider
Two dumb distributed crawlers
☆724Updated 6 years ago
bowenpay / wechat-spider
微信公众号爬虫
☆3,271Updated 4 years ago
michaelliao / sinaweibopy
新浪微博Python SDK
☆1,273Updated 4 years ago
xchaoinfo / fuck-login
模拟登录一些知名的网站，为了方便爬取需要登录的网站
☆5,897Updated 7 years ago
lanbing510 / LianJiaSpider
链家爬虫
☆687Updated 9 years ago
paicha / gxgk-wechat-server
校园微信公众号，使用 Python、Flask、Redis、MySQL、Celery [DEPRECATED]
☆1,392Updated 3 years ago
lzjun567 / zhihu-api
Zhihu API for Humans
☆980Updated 4 years ago
lucasjinreal / weibo_terminater
Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
☆2,320Updated 5 years ago
SpiderClub / weibospider
A distributed crawler for weibo, building with celery and requests.
☆4,815Updated 5 years ago
gnemoug / sina_reptile
获取新浪微博1000w用户的基本信息和每个爬取用户最近发表的50条微博,使用python编写，多进程爬取，将数据存储在了mongodb中
☆474Updated 12 years ago
taizilongxu / scrapy_jingdong
用scrapy写的京东爬虫
☆451Updated 10 years ago
chyroc / WechatSogou
基于搜狗微信搜索的微信公众号爬虫接口
☆6,130Updated last year
lanbing510 / DouBanSpider
豆瓣读书的爬虫
☆2,749Updated 5 years ago