分布式爬虫框架,基于webdrvier模拟用户请求,kafka消息传递,分布式网页存储使用hbase,task异步任务多线程解析,提供基础服务如:proxy ip服务和号码验证服务等, proxy page使用H5和we版进行接入
☆13Dec 18, 2015Updated 10 years ago
Alternatives and similar repositories for crawler-framework
Users that are interested in crawler-framework are comparing it to the libraries listed below
Sorting:
- Chromium-based headless browser for java☆28Oct 14, 2016Updated 9 years ago
- 人人网小黄鸡☆21Jan 4, 2013Updated 13 years ago
- hadoop中Map/Reduce使用示例,输入(DBInputFormat),输出(DBOutputFormat)为MySql数据库表、日志分析Grep、单词排序Sort...对HBase的基本操作,增、删、查、改,使用Map/Reduce批量导入数据到HBase表中..…☆14Apr 6, 2013Updated 12 years ago
- json或SQL语言转为flink或者spark流/批任务☆12Jun 21, 2022Updated 3 years ago
- flink 10 自我学习笔记和代码☆14Jun 29, 2022Updated 3 years ago
- 各种安全相关思维导图整理收集☆11Sep 7, 2015Updated 10 years ago
- 使用AI编程创建的 SillyTavern 角色卡制作工具☆19Jun 16, 2025Updated 8 months ago
- ☆15Aug 25, 2014Updated 11 years ago
- Label Studio is a multi-type data labeling and annotation tool with standardized output format☆10Nov 17, 2021Updated 4 years ago
- 易用的轻量化的网络爬虫(Easy to use lightweight web crawler)☆10Mar 21, 2016Updated 9 years ago
- Zookeeper Monitoring Extension for AppDynamics☆10Sep 29, 2021Updated 4 years ago
- 基于Spring+Mybatis+Jetty实现简单的用户信息接口。☆11Mar 13, 2015Updated 10 years ago
- 数据交换☆10Jun 5, 2024Updated last year
- Codec for Hadoop adding OpenPGP encryption using Bouncy Castle☆17Aug 18, 2011Updated 14 years ago
- 一个使用豆瓣频道和推荐与网易云音乐源的音乐播放器☆11May 29, 2015Updated 10 years ago
- swift是一个轻量级的web框架,实现了 IOC、MVC、ORM、AOP、RabbitMQ 功能,并且已经可以使用,满足基本的开发需要和学习使用,适合了解spring的基本原理。 未来将会逐步实现 安全管理 等功能。 如果你想观看源码,可以从 org.swift.fram…☆11Oct 24, 2023Updated 2 years ago
- An easy-to-use, scalable spark streaming ETL tool and sdk☆13Aug 14, 2017Updated 8 years ago
- api gateway based on netty☆12Jun 14, 2018Updated 7 years ago
- mx-chain-go common packages and high level definitions☆12Feb 19, 2026Updated last week
- Cpyptograhic library in java☆10Aug 19, 2024Updated last year
- -- End-Of-Support: 23.07.2021 -- Archived Repo! -- Please use the IDS-Messaging-Services! --☆13Jul 23, 2021Updated 4 years ago
- Spring Data implementation for ElasticSearch☆63Feb 22, 2022Updated 4 years ago
- ☆12Dec 10, 2018Updated 7 years ago
- Android UVPN 科学上网神器☆10Apr 6, 2017Updated 8 years ago
- 一个比Spark-Parquet还快5~100倍的存储格式☆12Feb 22, 2016Updated 10 years ago
- 设计模式☆10Jun 13, 2023Updated 2 years ago
- 蜜蜂牧场是一个数据采集清洗工具,也是一个ETL工具,同时也是一套脚本语言。☆14Jul 1, 2018Updated 7 years ago
- Sync是一款分布式场景下基于Redis的安全高效的线程同步组件,提供分布式可重入互斥锁、分布式可重入读写锁、分布式信号量。提供相应注解,使用简单,可与spring-boot无缝集成。☆13Oct 8, 2022Updated 3 years ago
- 抓取代理ip,保存有效可用的代理ip☆13Aug 22, 2014Updated 11 years ago
- Java资源大全中文版,包括开发库、开发工具、网站、博客、微信、微博等,由伯乐在线持续更新。☆11Nov 20, 2016Updated 9 years ago
- ☆12Sep 22, 2022Updated 3 years ago
- 迁移工具,目标是Oracle,MySQL,SqlServer到PostgreSQL的单项迁移,PostgreSQL和大数据平台Hive,Hbase,Impala等的双向迁移。☆10Dec 3, 2014Updated 11 years ago
- 硅基流动注册机☆15Mar 28, 2025Updated 11 months ago
- DeepSeek LLM: Let there be answers☆13Nov 30, 2023Updated 2 years ago
- DG-IoT 服务器Saas平台开发与部署☆11Nov 11, 2021Updated 4 years ago
- Extract data from text files or log files using regular expressions☆16Feb 27, 2016Updated 10 years ago
- common java tools☆14Aug 13, 2014Updated 11 years ago
- java crawler framework☆47Sep 1, 2022Updated 3 years ago
- Zookeeper Leader Election Demo Application☆10Nov 28, 2011Updated 14 years ago