striver-ing/distributed-spider

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/striver-ing/distributed-spider)

striver-ing / distributed-spider

通用新闻类网站分布式爬虫

☆79

Alternatives and similar repositories for distributed-spider

Users that are interested in distributed-spider are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Martin-030621 / TouTiao_Selenium
View on GitHub
今日头条搜索引擎以及新闻详情页爬虫（Selenium）
☆15Mar 13, 2025Updated last year
veelion / python-crawler
View on GitHub
☆13Aug 31, 2023Updated 2 years ago
asyncins / qsmi
View on GitHub
Questions in Spider Man Interview 爬虫工程师面试常见问题
☆11Mar 9, 2019Updated 7 years ago
jfzhang95 / news_spider
View on GitHub
新闻爬虫 (腾讯,网易,新浪,今日头条,搜狐,凤凰网,腾讯滚动新闻)
☆58Jun 6, 2018Updated 8 years ago
xiaolulu / mynodejs
View on GitHub
This is a practical demo for learning nodejs
☆11Sep 17, 2015Updated 10 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
xyuns-cc / Scrapy_DrissionPage
View on GitHub
基于Scrapy和DrissionPage的爬虫项目
☆24Mar 19, 2025Updated last year
striver-ing / headlines_today
View on GitHub
基于Python的爬取今日头条文章及视频
☆35Dec 15, 2016Updated 9 years ago
striver-ing / wechat-spider
View on GitHub
开源微信爬虫：爬取公众号所有文章、阅读量、点赞量和评论内容。易部署。持续维护！！！
☆2,902May 29, 2026Updated 2 months ago
Python3WebSpider / ScrapyUniversal
View on GitHub
Scrapy Universal Spider
☆57Aug 26, 2017Updated 8 years ago
orangeMask / spider
View on GitHub
抖音,淘宝系,常见新闻爬虫
☆13Apr 15, 2022Updated 4 years ago
Gerapy / GerapyPlaywright
View on GitHub
Downloader Middleware to support Playwright in Scrapy & Gerapy
☆111Mar 6, 2022Updated 4 years ago
digfound / sinacrawler
View on GitHub
第一次编写Python网络爬虫，主要使用beautifulsoup4爬取新浪新闻首页新闻列表。成功获取新闻标题、时间、来源、详情、评论数、编辑信息，使用pandas整理数据，并保存到数据库。
☆13Dec 7, 2017Updated 8 years ago
python-ruia / ruia-ua
View on GitHub
Simple user-agent middleware for Ruia
☆11Dec 31, 2020Updated 5 years ago
irabbit666666 / ibox-wtoken-server
View on GitHub
ibox-wtoken-server
☆22Jul 4, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
crystal-tensor / spide
View on GitHub
网络爬虫主要抓取的是股票数据，外汇数据，股票背景资料，股票及时新闻
☆13Aug 13, 2018Updated 7 years ago
percent4 / CRF_4_NER
View on GitHub
Using CRF++ for NER
☆20Feb 28, 2019Updated 7 years ago
jiangyuanyuan / lotterySpider
View on GitHub
Based on the Scrapy framework, crawling crawlers ------------------ 基于Scrapy 框架开发抓取新闻的爬虫 -------------
☆13Jul 26, 2019Updated 7 years ago
GeneralNewsExtractor / GneList
View on GitHub
A chrome extension to get XPath of list items in webpage easily.
☆34Mar 11, 2022Updated 4 years ago
SCU-JJkinging / BERT-Chinese-NER-pytorch
View on GitHub
pytorch实现基于Bert+BiLSTM+CRF的中文命名实体识别
☆44May 5, 2021Updated 5 years ago
dongrunhua / ScrapyUniversal
View on GitHub
基于Scrapy的通用爬虫框架
☆25Jan 9, 2019Updated 7 years ago
zhaoboy9692 / qccspider
View on GitHub
企查查企业信息爬虫，企查查app每日新增企业抓取,可以进行每日的增量抓取、企业数据、工商数据等等。
☆332Dec 8, 2022Updated 3 years ago
rdcprojects / scrapy-mq-redis
View on GitHub
A RabbitMQ/Redis tool for Scrapy
☆13Oct 7, 2016Updated 9 years ago
imondo / news-crawler
View on GitHub
node 小爬虫，爬取本地新闻
☆16May 2, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Ingram7 / NewsinaSpider
View on GitHub
Scrapy 新浪新闻爬虫
☆12Aug 26, 2019Updated 6 years ago
nkynp74dak / wenzi-sc-shipin
View on GitHub
文字自动生成视频 - 文字生成视频的AI工具软件汇总
☆14Apr 17, 2025Updated last year
LiuXingMing / Scrapy_Redis_Bloomfilter
View on GitHub
基于Redis的Bloomfilter去重，并将其扩展到Scrapy框架。
☆347Feb 26, 2023Updated 3 years ago
brady-chen / tbNews
View on GitHub
金融新闻增量式聚焦爬虫
☆21Jul 17, 2017Updated 9 years ago
Ckend / GzhToBlog
View on GitHub
[公众号爬虫]爬取公众号里的所有文章到博客数据库上
☆13Jul 25, 2019Updated 7 years ago
perpetually2014 / Official_Accounts
View on GitHub
公众号
☆10Jul 24, 2023Updated 3 years ago
Baiyuetribe / watermark-img
View on GitHub
可能是全网最方便的水印图床，支持宝塔一键部署、也支持Docker版部署至服务器或本地电脑
☆10Jul 16, 2019Updated 7 years ago
casual-silva / NewsCrawl
View on GitHub
狠心开源企业级舆情新闻爬虫项目：支持任意数量爬虫一键运行、爬虫定时任务、爬虫批量删除；爬虫一键部署；爬虫监控可视化; 配置集群爬虫分配策略；👉 现成的docker一键部署文档已为大家踩坑 an enterprise-grade public opinion news …
☆681May 23, 2026Updated 2 months ago
FrankXiong / cqunews-web
View on GitHub
利用Java网络爬虫爬取重庆大学新闻网站数据，依据解析的数据构建的新闻网站
☆11Mar 7, 2016Updated 10 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Johnson0722 / News_scrapy_redis
View on GitHub
☆30Jul 5, 2018Updated 8 years ago
cxapython / Spider
View on GitHub
爬虫，反爬虫， JS 逆向，安卓逆向， AST
☆12Sep 14, 2020Updated 5 years ago
lizongying / cron
View on GitHub
基于时间轮实现的定时任务，更准时，并发性能更高。支持crontab格式或every 1 second|minute|hour|day|month|week格式
☆16Nov 24, 2023Updated 2 years ago
rama291041610 / TongHuaShun-Spider
View on GitHub
一个同花顺财经新闻的爬虫。
☆16Apr 12, 2019Updated 7 years ago
Gerapy / GerapyAutoExtractor
View on GitHub
Auto Extractor Module
☆338Aug 19, 2024Updated last year
vectorsss / news_classification
View on GitHub
卷积神经网络&&爬虫实现网易新闻自动爬取并分类
☆13Dec 8, 2022Updated 3 years ago
qq3163450460 / js_rpc_drive
View on GitHub
js逆向通杀免补环境工具
☆36Aug 8, 2024Updated last year