互联网爬虫,蜘蛛,数据采集器,网页解析器的汇总,因新技术不断发展,新框架层出不穷,此文会不断更新...
☆333Oct 7, 2022Updated 3 years ago
Alternatives and similar repositories for awesome-crawler-cn
Users that are interested in awesome-crawler-cn are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Scraping and Web Crawling Framework For Zhihu Live☆63Oct 10, 2017Updated 8 years ago
- A spider... ^.^☆100Mar 23, 2014Updated 12 years ago
- 越来越多的网站具有反爬虫特性,有的用图片隐藏关键数据,有的使用反人类的验证码,建立反反爬虫的代码仓库,通过与不同特性的网站做斗争(无恶意)提高技术。(欢迎提交难以采集的网站)(因工作原因,项目暂停)☆7,293Oct 17, 2021Updated 4 years ago
- 使用scrapy,redis, mongodb,graphite实现的一个分布式网络爬虫,底层存储mongodb集群,分布式使用redis实现,爬虫状态显示使用graphite实现☆3,244Apr 18, 2017Updated 9 years ago
- Scrapy extension to write scraped items using Django models☆503Oct 15, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- 蓝天采集器是一款开源免费的爬虫系统,仅需点选编辑规则即可采集数据,可运行在本地、虚拟主机或云服务器中,几乎能采集所有类型的网页,无缝对接各类CMS建站程序,免登录实时发布数据,全自动无需人工干预!是网页大数据采集软件中完全跨平台的云端爬虫系统☆2,068Updated this week
- 🇨🇳翻译: <awesome-puppeteer> Puppeteer 资源的精选列表 ❤️ 校对 ✅☆23Mar 29, 2019Updated 7 years ago
- 越来越多的网站具有反爬虫特性,有的用图片隐藏关键数据,有的使用反人类的验证码,建立反反爬虫的代码仓库,通过与不同特性的网站做斗争(无恶意)提高技术。(欢迎提交难以采集的网站)☆23Dec 9, 2016Updated 9 years ago
- 用nodejs写的爬虫框架☆18May 24, 2017Updated 8 years ago
- A captcha library that generates audio and image CAPTCHAs.☆1,091Oct 21, 2025Updated 6 months ago
- ☆61Jan 6, 2017Updated 9 years ago
- A powerful tool to simulate millions of concurrent users for loading testing☆20Mar 10, 2014Updated 12 years ago
- Visual scraping for Scrapy☆9,491Jun 26, 2024Updated last year
- SlimIt - a JavaScript minifier/parser in Python☆548Jul 30, 2019Updated 6 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI.…☆3,398Feb 19, 2025Updated last year
- Specifically designed to solve the web crawler when collecting Internet web data who need to login the web-site by useing some Simulated…☆14Nov 30, 2016Updated 9 years ago
- A Powerful Spider(Web Crawler) System in Python.☆16,844Apr 30, 2024Updated 2 years ago
- 使用 Django2 作为接口后端,scrapy 作为爬虫的一个代理 IP 池☆10Jun 6, 2020Updated 5 years ago
- Python ProxyPool for web spider☆23,322Mar 27, 2026Updated last month
- ☕ 自制的 API☆16Dec 19, 2025Updated 4 months ago
- This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.☆1,228Nov 7, 2023Updated 2 years ago
- 各大网站登陆方式,有的是通过selenium登录,有的是通过抓包直接模拟登录(精力原因,目前不再继续维护)☆1,008Jul 26, 2022Updated 3 years ago
- 基于浏览器插件,模拟人工,堪称万能网页采集器;只需要安装几十k的插件 使用极其简单、效率高,让任何人都能从互联网采集自己想要的数据☆11May 21, 2019Updated 6 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆16Aug 10, 2016Updated 9 years ago
- Unofficial API for zhihu.☆46May 5, 2017Updated 8 years ago
- 可以自定义加速服务器,旨在为WordPress中国用户提供加速,加快站点更新版本、安装升级插件主题的速度,替换Gravatar头像链接。☆11Aug 8, 2020Updated 5 years ago
- 知乎分布式爬虫(Scrapy、Redis)☆169Feb 18, 2018Updated 8 years ago
- Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架☆12,200Feb 10, 2026Updated 2 months ago
- 模拟登录一些知名的网站,为了方便爬取需要登录的网站☆5,879Jun 8, 2018Updated 7 years ago
- VY-netcat is a network tool written based on vlang language, which is mainly used for building CTF problem environment, and will be integ…☆10Oct 20, 2024Updated last year
- Python资源大全中文版,包括:Web框架、网络爬虫、模板引擎、数据库、数据可视化、图片处理等,由「开源前哨」和「Python开发者」微信公号团队维护更新。☆30,358Aug 29, 2022Updated 3 years ago
- 微信公众号爬虫☆3,323Aug 10, 2021Updated 4 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Wechat Management System☆1,748May 17, 2018Updated 7 years ago
- Just a DEMO to demonstrate how to use JNA to type chars into alipay's password edit control automatically.☆12Dec 21, 2017Updated 8 years ago
- nodejs爬虫,输入网站自动生成网站sitemap☆12Mar 27, 2018Updated 8 years ago
- 可视化爬虫自动采集平台☆187Feb 27, 2023Updated 3 years ago
- A configurable web spider with a easy-to-use web console☆998Aug 21, 2018Updated 7 years ago
- Nacollector is a platform for web data collection.☆193Jan 6, 2025Updated last year
- Output scrapy statistics to graphite/carbon☆54Mar 9, 2013Updated 13 years ago