Scrapy project to scrape public web directories (educational) [DEPRECATED]
☆1,630Oct 27, 2017Updated 8 years ago
Alternatives and similar repositories for dirbot
Users that are interested in dirbot are comparing it to the libraries listed below
Sorting:
- Multifarious Scrapy examples. Spiders for alexa / amazon / douban / douyu / github / linkedin etc.☆3,266Nov 3, 2023Updated 2 years ago
- scrapy中文翻译文档☆1,107Sep 12, 2019Updated 6 years ago
- This is a sample Scrapy project for educational purposes☆1,354Nov 29, 2023Updated 2 years ago
- 使用scrapy,redis, mongodb,graphite实现的一个分布式网络爬虫,底层存储mongodb集群,分布式使用redis实现,爬虫状态显示使用graphite实现☆3,251Apr 18, 2017Updated 8 years ago
- Scrapy, a fast high-level web crawling & scraping framework for Python.☆60,007Feb 23, 2026Updated last week
- Redis-based components for Scrapy.☆5,643Jul 6, 2024Updated last year
- A Powerful Spider(Web Crawler) System in Python.☆17,002Apr 30, 2024Updated last year
- scrapy examples for crawling zhihu and github☆223Jan 11, 2023Updated 3 years ago
- Visual scraping for Scrapy☆9,493Jun 26, 2024Updated last year
- Scrapy extension to control spiders using JSON-RPC☆300Aug 26, 2019Updated 6 years ago
- A service daemon to run Scrapy spiders☆3,085Jan 16, 2026Updated last month
- Scrapy+Splash for JavaScript integration☆3,241Feb 11, 2025Updated last year
- ☆23Jan 31, 2015Updated 11 years ago
- A high-level distributed crawling framework.☆1,505Jul 31, 2022Updated 3 years ago
- A dynamic configurable news crawler based Scrapy☆164Jul 24, 2017Updated 8 years ago
- 用scrapy采集cnblogs列表页爬虫☆274Jun 16, 2015Updated 10 years ago
- This repository store some example to learn scrapy better☆177Oct 9, 2020Updated 5 years ago
- 获取知乎内容信息,包括问题,答案,用户,收藏夹信息☆2,321Feb 8, 2022Updated 4 years ago
- ☆95Apr 28, 2014Updated 11 years ago
- MongoDB pipeline for Scrapy. This module supports both MongoDB in standalone setups and replica sets. scrapy-mongodb will insert the item…☆358Apr 6, 2021Updated 4 years ago
- WEIBO_SCRAPY is a Multi-Threading SINA WEIBO data extraction Framework in Python.☆155Jul 28, 2017Updated 8 years ago
- Html Content / Article Extractor, web scrapping lib in Python☆4,063Dec 26, 2021Updated 4 years ago
- ☆167Nov 3, 2018Updated 7 years ago
- A simple, yet elegant, HTTP library.☆53,833Feb 22, 2026Updated last week
- Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed.☆22,404Jan 23, 2026Updated last month
- Fill HTML login forms automatically☆279Apr 24, 2024Updated last year
- 结巴中文分词☆34,768Aug 21, 2024Updated last year
- A web spider for zhihu.com☆725Jan 17, 2024Updated 2 years ago
- A pure-python HTML screen-scraping library☆1,886Apr 4, 2022Updated 3 years ago
- 新浪微博爬虫(Scrapy、Redis)☆3,280Sep 5, 2018Updated 7 years ago
- The Python micro framework for building web applications.☆71,285Feb 20, 2026Updated last week
- Command line client for Scrapyd server☆778Dec 15, 2025Updated 2 months ago
- 微信公众平台 Python 开发包 [DEPRECATED]☆1,350Oct 1, 2020Updated 5 years ago
- Scrapy examples crawling Craigslist☆201Apr 20, 2016Updated 9 years ago
- admin ui for scrapy/open source scrapinghub☆2,777May 4, 2023Updated 2 years ago
- A complete and graceful API for Wechat. 微信个人号接口、微信机器人及命令行微信,三十行即可自定义个人号机器人。☆26,663Sep 28, 2023Updated 2 years ago
- 用scrapy写的京东爬虫☆452Dec 5, 2014Updated 11 years ago
- Random proxy middleware for Scrapy☆1,672Oct 1, 2019Updated 6 years ago
- 谷歌全新开源人工智能系统TensorFlow官方文档中文版☆12,385Aug 4, 2019Updated 6 years ago