zhangslob/awesome_crawl

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zhangslob/awesome_crawl)

zhangslob / awesome_crawl

腾讯新闻、知乎话题、微博粉丝，Tumblr爬虫、斗鱼弹幕、妹子图爬虫、分布式设计等

☆303

Alternatives and similar repositories for awesome_crawl

Users that are interested in awesome_crawl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Yanxueshan / Scrapy-Redis-Zhihu
View on GitHub
基于scrapy-redis实现分布式爬虫，爬取知乎所有问题及对应的回答，集成selenium模拟登录、英文验证码及倒立文字验证码识别、随机生成User-Agent、IP代理、处理302重定向问题等等
☆61Apr 3, 2019Updated 7 years ago
zyingzhou / zuiyouSpider
View on GitHub
最右APP爬虫，用Python爬取最右APP段子数据和视频弹幕。
☆22Jun 29, 2019Updated 7 years ago
Jacen789 / NewsCrawler
View on GitHub
新闻爬虫，爬取新浪、搜狐、新华网即时财经新闻。
☆196May 9, 2020Updated 6 years ago
yinzishao / NewsScrapy
View on GitHub
基于scrapy的新闻爬虫
☆101Apr 18, 2020Updated 6 years ago
starryrbs / awesome-scrapy
View on GitHub
scrapy实战教程，分享scrapy爬虫的知识，针对各大网站做爬虫采集，并且以实例代码讲解。
☆10Jan 22, 2026Updated 6 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
TauWu / weibo_daily_hotkey
View on GitHub
Weibo's daily TOP5 hotkey. 自动爬取、筛选新浪微博每日热搜词 TOP5。https://github.com/TauWu/weibo_daily_hotkey/blob/master/data/data.md
☆36Apr 18, 2021Updated 5 years ago
Harhao / toutiao
View on GitHub
今日头条科技新闻接口爬虫
☆17Sep 26, 2017Updated 8 years ago
zhangslob / docs
View on GitHub
《数据采集从入门到放弃》源码。内容简介：爬虫介绍、就业情况、爬虫工程师面试题；HTTP协议介绍； Requests使用；解析器Xpath介绍； MongoDB与MySQL；多线程爬虫； Scrapy介绍；Scrapy-redis介绍；使用docker部署；使用n…
☆139Jun 26, 2019Updated 7 years ago
HEUDavid / WeiboSpider
View on GitHub
微博爬虫有问题欢迎提出来
☆17Jul 2, 2019Updated 7 years ago
shipengtaov / weibo_sentiment
View on GitHub
微博粉丝情绪分析
☆44May 28, 2017Updated 9 years ago
Jaysong2012 / tutorial
View on GitHub
Scrapy爬虫实战系列，从零开始爬取腾讯百度淘宝知乎各大网站内容 \n 12306刷票脚本系列
☆80Apr 2, 2019Updated 7 years ago
Wooden-Robot / scrapy-tutorial
View on GitHub
Scrapy 爬虫框架教程源码
☆109Aug 23, 2019Updated 6 years ago
Jannchie / simpyder
View on GitHub
超高速异步协程Python爬虫
☆80Feb 15, 2023Updated 3 years ago
Randy-whiteSugar / LagouSpider_Scrapy
View on GitHub
使用Scrapy编写的拉勾网爬虫，添加了代理IP池、增量爬取机制
☆11May 22, 2023Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
jfzhang95 / news_spider
View on GitHub
新闻爬虫 (腾讯,网易,新浪,今日头条,搜狐,凤凰网,腾讯滚动新闻)
☆58Jun 6, 2018Updated 8 years ago
EruDev / BlogDoc
View on GitHub
我的博文汇总
☆10Sep 3, 2018Updated 7 years ago
yangbendong / GoogleTranslator
View on GitHub
A free API for Google Translate. 免费的谷歌翻译，与谷歌翻译网页版相同，可选国内服务器。亲测一日300万字没问题。
☆13Nov 22, 2019Updated 6 years ago
markusleevip / go-shici
View on GitHub
go语言爬虫-爬虫诗词网站，生成诗词图片
☆19Jan 6, 2020Updated 6 years ago
97CBR / SoftwareUpdateServer
View on GitHub
Software Update Server 软件更新服务
☆22Jul 9, 2024Updated 2 years ago
chenjiandongx / bili-spider
View on GitHub
📺 B 站全站视频信息爬虫
☆698Feb 17, 2019Updated 7 years ago
Gerapy / GerapyProxy
View on GitHub
A package for supporting proxy in Scrapy & Gerapy
☆11Jul 15, 2020Updated 6 years ago
suetming / usa_stock_data_crawler
View on GitHub
美国股票爬取（NASDAQ，AMEX，NYSE）
☆17Nov 24, 2016Updated 9 years ago
zhanghe06 / news_spider
View on GitHub
新闻抓取（微信、微博、头条...）
☆225Dec 8, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
facert / tumblr_spider
View on GitHub
汤不热 python 多线程爬虫
☆462Jul 22, 2020Updated 6 years ago
zkzhang1986 / -Scrapy-
View on GitHub
《精通scrapy网络爬虫》中代码
☆10May 15, 2020Updated 6 years ago
rayzgithub / ZhiHuSpider
View on GitHub
自写爬虫爬取知乎问题及回答
☆39Jun 10, 2019Updated 7 years ago
nciefeiniu / wenshu
View on GitHub
🚀🚀文书网cookie获取 2020-08-23 依旧可行。（已终结）
☆51Aug 23, 2020Updated 5 years ago
AlexTan-b-z / ZhihuSpider
View on GitHub
知乎分布式爬虫（Scrapy、Redis）
☆169Feb 18, 2018Updated 8 years ago
15920036578 / JD_Spider
View on GitHub
京东爬虫（大量注释，对刚入门爬虫者极度友好）
☆72Apr 19, 2019Updated 7 years ago
shikanon / proxy_scrapy
View on GitHub
proxy_scrapy是一个scrapy搭建的代理模块，主要包括代理抓取、代理测试和使用代理三个模块。包括了对主要的代理网站的抓取和代理稳定性的测试，并整合进scrapy爬虫当中。
☆10Jan 20, 2017Updated 9 years ago
Python3WebSpider / DouYin
View on GitHub
API of DouYin for Humans used to Crawl Popular Videos and Musics
☆653Jan 29, 2020Updated 6 years ago
NightMarcher / zhihu-crawler
View on GitHub
徒手实现定时爬取知乎，从中发掘有价值的信息，并可视化爬取的数据作网页展示。
☆67Mar 27, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
orangeMask / spider
View on GitHub
抖音,淘宝系,常见新闻爬虫
☆13Apr 15, 2022Updated 4 years ago
wnma3mz / wechat_articles_spider
View on GitHub
微信公众号文章的爬虫
☆3,474Apr 18, 2024Updated 2 years ago
tankle / newscrawler
View on GitHub
新闻网站爬虫,目前能够爬取网易，新浪，qq，搜狐等三家网站的新闻页面，并保存到本地。
☆34Jun 12, 2015Updated 11 years ago
gyqlr / weibo_spider
View on GitHub
微博爬虫，爬去微博语料，情感分析，user-agent池，充足IP，scrapy，mongodb
☆15Aug 23, 2018Updated 7 years ago
HegemonyTao / crawlProject
View on GitHub
今日头条、淘宝、微博、斗鱼、抖音、哔哩哔哩、有道翻译、steam网站以及网易云音乐爬取
☆62Apr 17, 2020Updated 6 years ago
yingjinghuang / WeiboCrawler
View on GitHub
新浪微博的爬虫
☆81Jul 5, 2024Updated 2 years ago
shisiying / tc_zufang
View on GitHub
使用scrapy,redis, mongodb,django实现的一个分布式网络爬虫,底层存储mongodb,分布式使用redis实现,使用django可视化爬虫
☆280May 1, 2018Updated 8 years ago