qinxuye/cola

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/qinxuye/cola)

qinxuye / cola

A high-level distributed crawling framework.

☆1,500

Alternatives and similar repositories for cola

Users that are interested in cola are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

gnemoug / distribute_crawler
View on GitHub
使用scrapy,redis, mongodb,graphite实现的一个分布式网络爬虫,底层存储mongodb集群,分布式使用redis实现,爬虫状态显示使用graphite实现
☆3,243Apr 18, 2017Updated 9 years ago
binux / pyspider
View on GitHub
A Powerful Spider(Web Crawler) System in Python.
☆16,797Apr 30, 2024Updated 2 years ago
scrapinghub / portia
View on GitHub
Visual scraping for Scrapy
☆9,506Jun 26, 2024Updated 2 years ago
jmg / crawley
View on GitHub
Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.
☆193Updated this week
lorien / grab
View on GitHub
Web Scraping Framework
☆2,461Sep 19, 2025Updated 10 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
rmax / scrapy-redis
View on GitHub
Redis-based components for Scrapy.
☆5,645May 19, 2026Updated 2 months ago
grangier / python-goose
View on GitHub
Html Content / Article Extractor, web scrapping lib in Python
☆4,100Mar 10, 2026Updated 4 months ago
egrcc / zhihu-python
View on GitHub
获取知乎内容信息，包括问题，答案，用户，收藏夹信息
☆2,332Feb 8, 2022Updated 4 years ago
jmcarp / robobrowser
View on GitHub
☆3,695Sep 10, 2020Updated 5 years ago
geekan / scrapy-examples
View on GitHub
Multifarious Scrapy examples. Spiders for alexa / amazon / douban / douyu / github / linkedin etc.
☆3,254Nov 3, 2023Updated 2 years ago
LiuXingMing / SinaSpider
View on GitHub
新浪微博爬虫（Scrapy、Redis）
☆3,285Sep 5, 2018Updated 7 years ago
2shou / PhantomjsFetcher
View on GitHub
A python web fetcher using phantomjs to mock browser
☆180Oct 10, 2017Updated 8 years ago
scrapinghub / frontera
View on GitHub
A scalable frontier for web crawlers
☆1,332Jun 6, 2025Updated last year
rq / rq
View on GitHub
Simple job queues for Python
☆10,667Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
istresearch / scrapy-cluster
View on GitHub
This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
☆1,225Nov 7, 2023Updated 2 years ago
scrapy / scrapyd
View on GitHub
A service daemon to run Scrapy spiders
☆3,097Updated this week
qiyeboy / IPProxyPool
View on GitHub
IPProxyPool代理池项目，提供代理ip
☆4,280Jul 13, 2018Updated 8 years ago
princehaku / pyrailgun
View on GitHub
Simple And Easy Python Crawler Framework，支持抓取javascript渲染的页面的简单实用高效的python网页爬虫抓取模块
☆379Sep 3, 2021Updated 4 years ago
DormyMo / SpiderKeeper
View on GitHub
admin ui for scrapy/open source scrapinghub
☆2,768May 4, 2023Updated 3 years ago
scrapy / scrapy
View on GitHub
Scrapy, a fast high-level web crawling & scraping framework for Python.
☆63,273Updated this week
scrapy-plugins / scrapy-splash
View on GitHub
Scrapy+Splash for JavaScript integration
☆3,229Feb 11, 2025Updated last year
douban / dpark
View on GitHub
Python clone of Spark, a MapReduce alike framework in Python
☆2,663Dec 25, 2020Updated 5 years ago
codelucas / newspaper
View on GitHub
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
☆15,114Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
xchaoinfo / fuck-login
View on GitHub
模拟登录一些知名的网站，为了方便爬取需要登录的网站
☆5,870Jun 8, 2018Updated 8 years ago
scrapy / scrapely
View on GitHub
A pure-python HTML screen-scraping library
☆1,884Apr 4, 2022Updated 4 years ago
yoyzhou / weibo_scrapy
View on GitHub
WEIBO_SCRAPY is a Multi-Threading SINA WEIBO data extraction Framework in Python.
☆155Jun 3, 2026Updated last month
MechanicalSoup / MechanicalSoup
View on GitHub
A Python library for automating interaction with websites.
☆4,876Updated this week
scrapinghub / splash
View on GitHub
Lightweight, scriptable browser as a service with an HTTP API
☆4,190Aug 2, 2024Updated last year
wuchong / scrapy-dynamic-configurable
View on GitHub
A dynamic configurable news crawler based Scrapy
☆164Jul 24, 2017Updated 8 years ago
spyoungtech / grequests
View on GitHub
Requests + Gevent = <3
☆4,575Aug 8, 2024Updated last year
isnowfy / snownlp
View on GitHub
Python library for processing Chinese text
☆6,630Jan 19, 2020Updated 6 years ago
pricingassistant / mrq
View on GitHub
Mr. Queue - A distributed worker task queue in Python using Redis & gevent
☆894Jun 13, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
andeya / pholcus
View on GitHub
Pholcus is a distributed high-concurrency crawler software written in pure golang
☆7,578Mar 3, 2026Updated 4 months ago
manning23 / MSpider
View on GitHub
Spider
☆345Jul 11, 2022Updated 4 years ago
gevent / gevent
View on GitHub
Coroutine-based concurrency library for Python
☆6,442Jul 2, 2026Updated 2 weeks ago
FullerHua / gooseeker
View on GitHub
☆693Oct 26, 2016Updated 9 years ago
JobsDong / tigerspider
View on GitHub
tigerspider: a fast high-level screen scraping and web crawling framework for Python.
☆33May 25, 2015Updated 11 years ago
wangshunping / weibo_spider
View on GitHub
graduate project, a weibo spider to find some interesting information such as "In social network , people tend to be happy or sad."
☆272Apr 10, 2016Updated 10 years ago
luyishisi / Anti-Anti-Spider
View on GitHub
越来越多的网站具有反爬虫特性，有的用图片隐藏关键数据，有的使用反人类的验证码，建立反反爬虫的代码仓库，通过与不同特性的网站做斗争（无恶意）提高技术。（欢迎提交难以采集的网站）（因工作原因，项目暂停）
☆7,285Oct 17, 2021Updated 4 years ago