使用java+httpclient+httpcleaner,多线程、分布式爬去电商网站商品信息,数据存储在hbase上,并使用solr对商品建立索引,使用redis队列存储一个共享的url仓库;使用zookeeper对爬虫节点生命周期进行监视等。
☆233Nov 6, 2020Updated 5 years ago
Alternatives and similar repositories for spider
Users that are interested in spider are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- a simple distributed spider in Java. Java编写的一个简单分布式爬虫☆159Jun 18, 2013Updated 12 years ago
- 新浪微博爬虫,采用Java语言开发,基于HTTPClient 4.0,采用MySQL存储爬取数据,支持多进程并发执行。功能包括:爬取微博、评论、转发、关注列表(层次)。根据数据需求,持续更新...☆356Feb 27, 2014Updated 12 years ago
- 各大电商网站数据抓取分析☆32Sep 17, 2013Updated 12 years ago
- excel 公用导出组件☆13Jan 28, 2016Updated 10 years ago
- 轻量级的事件驱动和异步框架☆39Sep 1, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- 一个基于webmagic框架二次开发的java爬虫框架实战,已实现能爬取腾讯,搜狐,今日头条(单独集成功能)等资讯内容,配合elasticsearch框架用法,实现了自动爬虫,已投入线上生产使用。☆341Nov 16, 2022Updated 3 years ago
- 基于Map/Reduce爬虫,可抽取各大新闻网站的新闻正文并进行分类和聚类☆74Jan 5, 2014Updated 12 years ago
- 爬虫项目源码整理,使用redis进行url缓存,hbase进行详细信息的存储。使用zookeeper进行爬虫线程的状态监控。☆19Oct 7, 2015Updated 10 years ago
- 淘宝商品评价的爬虫☆26Feb 29, 2016Updated 10 years ago
- ActiveMQ Sample for Java.☆16Dec 6, 2011Updated 14 years ago
- 利用spring boot + webmagic 开发的java爬虫系统☆61Dec 29, 2016Updated 9 years ago
- 利用WebMagic框架进行58同城数据的抓取☆12Oct 13, 2014Updated 11 years ago
- 使用kafka实现log4j日志集中管理☆14Jan 6, 2021Updated 5 years ago
- 一个使用dubbo分布式事务开发的简易支付系统☆52Aug 26, 2016Updated 9 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- 基于spring mvc+redis+logback+elk的日志demo☆12Feb 23, 2017Updated 9 years ago
- 《Java多线程编程实战指南(设计模式篇)》源码☆663Mar 16, 2020Updated 6 years ago
- 抓取各报社报纸信息-采用配置文件形式实现的一个简单的可定制爬虫☆11Sep 1, 2022Updated 3 years ago
- springboot-dubbox后台管理☆195Feb 11, 2017Updated 9 years ago
- 丽人美妆商城电子商务平台☆29Dec 20, 2015Updated 10 years ago
- Java网络爬虫小说下载器。使用httpclient,jsoup,dom4j,json-lib,SWT创建的可下载小说的网络爬虫项目。☆19Jul 10, 2018Updated 7 years ago
- This shows how to embedd Hystrix in a non invasive manner into existing Spring applications.☆24May 5, 2014Updated 11 years ago
- 提供Java中的一些分布式远程调用的ShowCase,包括RMI、CXF、Burlap、Hessian、HttpInvoker、JMS、REST、MetaQ、Dubbo。☆98Aug 30, 2014Updated 11 years ago
- 一个为spark批量导入数据到hbase的库☆42Nov 18, 2016Updated 9 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- 开源基础后台开发框架,基于springmvc+spring+hibernate搭建,前端采用angular js+sea js +bootstrap展现。☆11Jun 16, 2016Updated 9 years ago
- 并发编程☆29Mar 18, 2024Updated 2 years ago
- API服务基础脚手架搭建,采用spring-boot\spring-session\mybatis\redis\quartz等,支持集群部署☆57Mar 18, 2019Updated 7 years ago
- activemq+spring+jms☆13Aug 1, 2013Updated 12 years ago
- 基于SpringMVC+spring+Mybatis的校园o2o电商项目的后台和管理平台☆374Jun 21, 2022Updated 3 years ago
- 乐视集团支付订单系统分库分表开源实现☆123Feb 23, 2017Updated 9 years ago
- 关于通过百度地图API采集POI数据,并存储到HBase的项目。☆25Mar 14, 2016Updated 10 years ago
- 分布式数据源分表分库、读写分离应用层框架☆55Nov 5, 2015Updated 10 years ago
- 基于spark streaming和kafka,hbase的日志统计分析系统☆263Sep 5, 2017Updated 8 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- GuozhongCrawler的是一个无须配置、便于二次开发的爬虫开源框架,它提供简单灵活的API,只需少量代码即可实现一个爬虫。其设计灵感来源于多个爬虫国内外爬虫框架的总结。采用完全模块化的设计,功能覆盖整个爬虫的生命周期(链接提取、页面下载、内容抽取、持久化),支持多线…☆102Apr 20, 2015Updated 10 years ago
- 分布式在线聊天系统☆10Sep 17, 2014Updated 11 years ago
- 基于springMVC4构建的seed项目,提供统一的rest接口响应、异常处理、参数校验等☆28Oct 9, 2014Updated 11 years ago
- 分布式脚手架框架(总结整理)☆15Aug 27, 2015Updated 10 years ago
- 基于netty的分布式聊天服务器。整合zookeeper☆74Jun 25, 2022Updated 3 years ago
- 使用dubbo注册服务,netty做服务器,springmvc提供restful接口☆244May 8, 2019Updated 6 years ago
- 利用HttpClient4+实现网络小说爬虫,可动态添加热门的小说网站☆30Sep 6, 2012Updated 13 years ago