Scrapy project to scrape public web directories (educational) [DEPRECATED]
☆1,631Oct 27, 2017Updated 8 years ago
Alternatives and similar repositories for dirbot
Users that are interested in dirbot are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Multifarious Scrapy examples. Spiders for alexa / amazon / douban / douyu / github / linkedin etc.☆3,267Nov 3, 2023Updated 2 years ago
- This is a sample Scrapy project for educational purposes☆1,354Nov 29, 2023Updated 2 years ago
- scrapy中文翻译文档☆1,106Sep 12, 2019Updated 6 years ago
- 使用scrapy,redis, mongodb,graphite实现的一个分布式网络爬虫,底层存储mongodb集群,分布式使用redis实现,爬虫状态显示使用graphite实现☆3,249Apr 18, 2017Updated 8 years ago
- Scrapy, a fast high-level web crawling & scraping framework for Python.☆60,886Updated this week
- Redis-based components for Scrapy.☆5,636Jul 6, 2024Updated last year
- A Powerful Spider(Web Crawler) System in Python.☆17,009Apr 30, 2024Updated last year
- scrapy examples for crawling zhihu and github☆223Jan 11, 2023Updated 3 years ago
- ☆95Apr 28, 2014Updated 11 years ago
- Scrapy extension to control spiders using JSON-RPC☆299Aug 26, 2019Updated 6 years ago
- WEIBO_SCRAPY is a Multi-Threading SINA WEIBO data extraction Framework in Python.☆155Jul 28, 2017Updated 8 years ago
- This repository store some example to learn scrapy better☆177Oct 9, 2020Updated 5 years ago
- Visual scraping for Scrapy☆9,497Jun 26, 2024Updated last year
- A dynamic configurable news crawler based Scrapy☆164Jul 24, 2017Updated 8 years ago
- 用scrapy采集cnblogs列表页爬虫☆274Jun 16, 2015Updated 10 years ago
- A service daemon to run Scrapy spiders☆3,087Mar 2, 2026Updated 3 weeks ago
- 获取知乎内容信息,包括问题,答案,用户,收藏夹信息☆2,324Feb 8, 2022Updated 4 years ago
- Scrapy+Splash for JavaScript integration☆3,233Feb 11, 2025Updated last year
- ☆23Jan 31, 2015Updated 11 years ago
- A high-level distributed crawling framework.☆1,505Jul 31, 2022Updated 3 years ago
- Scrapy examples crawling Craigslist☆201Apr 20, 2016Updated 9 years ago
- MongoDB pipeline for Scrapy. This module supports both MongoDB in standalone setups and replica sets. scrapy-mongodb will insert the item…☆358Apr 6, 2021Updated 4 years ago
- Html Content / Article Extractor, web scrapping lib in Python☆4,070Mar 10, 2026Updated last week
- 微信公众平台 Python 开发包 [DEPRECATED]☆1,350Oct 1, 2020Updated 5 years ago
- ☆167Nov 3, 2018Updated 7 years ago
- 结巴中文分词☆34,813Aug 21, 2024Updated last year
- A simple, yet elegant, HTTP library.☆53,882Mar 5, 2026Updated 2 weeks ago
- A web spider for zhihu.com☆725Jan 17, 2024Updated 2 years ago
- Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed.☆22,406Updated this week
- Fill HTML login forms automatically☆280Apr 24, 2024Updated last year
- 新浪微博爬虫(Scrapy、Redis)☆3,279Sep 5, 2018Updated 7 years ago
- The Python micro framework for building web applications.☆71,365Mar 8, 2026Updated 2 weeks ago
- Collection of Scrapy utilities (extensions, middlewares, pipelines, etc)☆33Feb 22, 2018Updated 8 years ago
- A middleware for scrapy. Used to change HTTP proxy from time to time.☆322Feb 1, 2018Updated 8 years ago
- 用scrapy写的京东爬虫☆451Dec 5, 2014Updated 11 years ago
- A pure-python HTML screen-scraping library☆1,887Apr 4, 2022Updated 3 years ago
- Random proxy middleware for Scrapy☆1,671Oct 1, 2019Updated 6 years ago
- 模拟登录一些知名的网站,为了方便爬取需要登录的网站☆5,883Jun 8, 2018Updated 7 years ago
- A complete and graceful API for Wechat. 微信个人号接口、微信机器人及命令行微信,三十行即可自定义个人号机器人。☆26,675Sep 28, 2023Updated 2 years ago