"奇伢爬虫"是基于sprint boot 、 WebMagic 实现 微信公众号文章、新闻、csdn、info等网站文章爬取,可以动态设置文章爬取规则、清洗规则,基本实现了爬取大部分网站的文章。
☆323Sep 3, 2017Updated 8 years ago
Alternatives and similar repositories for javaCrawling
Users that are interested in javaCrawling are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 微信公众号文章爬虫☆43Sep 1, 2022Updated 3 years ago
- 一个基于webmagic框架二次开发的java爬虫框架实战,已实现能爬取腾讯,搜狐,今日头条(单独集成功能)等资讯内容,配合elasticsearch框架用法,实现了自动爬虫,已投入线上生产使用。☆339Nov 16, 2022Updated 3 years ago
- 基于webmagic + springboot + mybatis的Java爬虫,使用Echarts进行数据可视化分析,提供了从爬虫获取数据到数据持久化、数据可视化分析以及构建简单的代理池等一整套解决方案模板。☆367Oct 26, 2017Updated 8 years ago
- 基于Map/Reduce爬虫,可抽取各大新闻网站的新闻正文并进行分类和聚类☆73Jan 5, 2014Updated 12 years ago
- java爬虫,反爬虫策略、ETL清洗数据,以及spark离线和实时分析新闻并存入ES☆19Nov 26, 2018Updated 7 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- 天气爬虫(全国城镇天气自动定时抓取更新,并开放RESTful查询接口),附带代理IP池定时更新并检测其可用性☆364Jun 25, 2018Updated 8 years ago
- 利用Java网络爬虫 爬取重庆大学新闻网站数据,依据解析的数据构建的新闻网站☆11Mar 7, 2016Updated 10 years ago
- 用java写的搜狐新闻爬虫☆14May 2, 2017Updated 9 years ago
- 一个简单、敏捷、分布式的支持SpringBoot的Java爬虫框架;An agile, distributed crawler framework.☆1,993Updated this week
- 基于 webmagic 的 Java 爬虫应用☆2,775Jan 8, 2022Updated 4 years ago
- 微信公众号文章抓取,Java实现☆10Apr 14, 2017Updated 9 years ago
- 新浪新闻爬虫☆15Feb 14, 2015Updated 11 years ago
- 抓取下载在线视频网站,支持优酷,爱奇艺、Youtube、乐视等☆12Jul 12, 2017Updated 8 years ago
- zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目☆920Apr 2, 2019Updated 7 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- 新浪微博爬虫,采用Java语言开发,基于HTTPClient 4.0,采用MySQL存储爬取数据,支持多进程并发执行。功能包括:爬取微博、评论、转发、关注列表(层次)。根据数据需求,持续更新...☆356Feb 27, 2014Updated 12 years ago
- Lianjia house spider链家二手房爬虫~ Springboot + Webmagic + Mysql + Redis☆27Apr 22, 2021Updated 5 years ago
- 今日头条科技新闻接口爬虫☆17Sep 26, 2017Updated 8 years ago
- 豆瓣爬虫 爬取热门标签、图书信息、图书评论 系统架构 Webmagic+SSM+Redis+Mysql+ActiveMQ+Druid☆45Apr 24, 2019Updated 7 years ago
- 百科名医的数据爬虫,科室、疾病、症状、检查等类型,包括医疗百科。☆11Dec 21, 2017Updated 8 years ago
- A configurable web spider with a easy-to-use web console☆997Jun 3, 2026Updated 3 weeks ago
- Spark混合推荐系统大数据监控平台☆11May 1, 2018Updated 8 years ago
- 天猫爬虫☆17Feb 4, 2013Updated 13 years ago
- 项目中常用的基础功能实现类。如图片处理、文件操作、短信发送、地图定位、距离计算、MD5、二维码生成与解析、在线支付、消息推送、JSON处理等等。☆11Jan 4, 2016Updated 10 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- SpringBoot+Solr + webmagic JD商品爬取数据,放入solr中做搜索,学习下solr使用☆45Aug 31, 2017Updated 8 years ago
- A scalable web crawler framework for Java.☆11,681Dec 20, 2025Updated 6 months ago
- 豆瓣电影爬虫——a crawler which is able to crawl movie detail and short comments, save them to database mysql, also include Sentiment analysis ba…☆69Mar 24, 2019Updated 7 years ago
- 基于WebCollector的新浪微博爬虫及相关登录工具,如新浪微博Cookie获取☆14Nov 21, 2018Updated 7 years ago
- 基于Spring Boot实现的一个简易的Java社区☆897Jun 17, 2022Updated 4 years ago
- NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。☆647Jun 5, 2026Updated 3 weeks ago
- 实现视音频的采集,编码,rtmp推送☆15Dec 22, 2015Updated 10 years ago
- 前后分离的通用后台管理系统,前端基于Ant Design Pro-Vue,后端基于SpringBoo 2.x。包含:定时任务管控、基于JWT的配合Shiro的鉴权、一键生成前后端增删改查代码、基于RBAC权限管控、用户管理、组织架构、各类日志(操作日志/定时任务日志/登录日…☆14Dec 10, 2022Updated 3 years ago
- springboot+vue页面 REST代码生成器,生成单表的增删改查☆30May 14, 2023Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- WebHunger is an extensible, full-scale crawler framework that supports distributed crawling, aiming at getting users focused on web page …☆18Apr 11, 2018Updated 8 years ago
- 使用WebMagic抓取招聘信息,并且持久化到Mysql的例子。☆222Nov 22, 2016Updated 9 years ago
- 一个SpringMVC4+EasyUI的后台管理系统,已投入生产线上使用。下载导入SQL脚本,开箱即用,五分钟完成部署。☆146Dec 16, 2022Updated 3 years ago
- 多线程爬虫--抓取淘宝商品详情页URL☆130Dec 26, 2018Updated 7 years ago
- 基于微服务的消防物联网云平台☆23Nov 16, 2022Updated 3 years ago
- micro-mall项目是一套电商系统,包括前台商城系统及后台管理系统,基于Alibaba Spring Cloud + MyBatis实现。前台商城系统包含首页门户、商品推荐、商品搜索、商品展示、购物车、订单流程、会员中心、客户服务、帮助中心等模块。后台管理系统包含商品管…☆17Jun 21, 2022Updated 4 years ago
- 微信公众号爬虫:服务端公众号文章数据采集☆44Dec 16, 2022Updated 3 years ago