spring整合webmagic,mybatis,dungproxy
☆29Jun 14, 2023Updated 2 years ago
Alternatives and similar repositories for tom-crawler
Users that are interested in tom-crawler are comparing it to the libraries listed below
Sorting:
- 变强之路先从一个自研RPC开始吧~☆14Jun 17, 2022Updated 3 years ago
- webmagic 爬取我喜欢的网易云歌单+评论☆51Sep 23, 2017Updated 8 years ago
- 结合EChartsAnnotation实践的数据可视化项目☆10Mar 14, 2016Updated 9 years ago
- 处理视频,通过修改视频文件达到变更文件md5,从而使视频变唯一,不在秒传,不在被封杀。☆10Dec 2, 2015Updated 10 years ago
- React Application Template for creating portals with Embedded Tableau Dashboards☆11Jun 7, 2022Updated 3 years ago
- A Spring user-agent resolver for server-side detection of browser and operating system.☆10Mar 16, 2017Updated 8 years ago
- Face hashing using neural networks, mapping images to Hamming codes.☆10Dec 21, 2018Updated 7 years ago
- 本项目是一个精简版的spring,通过自己实现一遍spring来理解spring框架的精华。☆12Jan 27, 2019Updated 7 years ago
- POC for all the stack of big data (kafka, spark, cassandra, hdfs, docker, springboot)☆12Dec 16, 2022Updated 3 years ago
- Windows Live API binding and connect support.☆18Dec 1, 2024Updated last year
- A Kudu extension for Sparklyr☆11Feb 3, 2021Updated 5 years ago
- source code and data☆15Jan 16, 2019Updated 7 years ago
- 以CMS为业务需求构建的springBoot项目☆13Sep 27, 2018Updated 7 years ago
- ☆10Feb 26, 2019Updated 7 years ago
- shard.py 是在web.py数据库访问模块的基础上增加了读写分离、shard以及ORM的功能☆23Aug 8, 2013Updated 12 years ago
- Base hadoop/spark/bigdata image with advanced config loading scripts.☆11Nov 3, 2020Updated 5 years ago
- 蜂巢爬虫系统 是一套只需要定义XPath,就可实现爬取网站,APP的系统, 支持多种解析方式(XPath,正则表达式),多种下载方式(HttpClient库, PhantomJs, Selenium),多种输出方式(Excel,MongoDB)。 可不做任何修改发布到Yar…☆10Sep 5, 2016Updated 9 years ago
- DistributeCrawler的Maven版☆10Jun 20, 2022Updated 3 years ago
- Autoproxy automatically detects proxies and stores them in the respective environment variables (e.g. http_proxy).☆13Oct 2, 2016Updated 9 years ago
- 机器学习:1)离线统计(统计数据即可),离线推荐(基于LFM隐语义模型 采用ALS算法 ,并根据最小方差计算RMSE),2)实时推荐,实时根据用户最近 看过的一部电影,找到相似的电影(相似矩阵由上一个需求得出)作为候选电影,再结合最近评分的电影,推出优先级别 3)基于内容(电…☆13Mar 21, 2019Updated 6 years ago
- flink学习笔记,包含DataSet、DataStream、Window、缓存、Source、Sink相关说明、水印及示例代码☆12Jul 22, 2023Updated 2 years ago
- riemann tool for cassandra☆32May 19, 2016Updated 9 years ago
- An algorithm based on Java implementation, can automatically check the set of outliers in a set of data, eliminate these outliers, and fi…☆12May 11, 2021Updated 4 years ago
- Implementing java based text extractors as web APIs (currently only Boilerpipe & Goose)☆16Apr 1, 2012Updated 13 years ago
- data collect and data analysis☆10Aug 10, 2015Updated 10 years ago
- Apache Jena Fuseki extension module for receiving data over Apache Kafka topics.☆15Updated this week
- Automatic CAPTCHA decoding☆11Apr 17, 2012Updated 13 years ago
- 检测类路径下是否存在jar包中的class冲突,通常用于web应用的lib目录下class冲突检查☆10Aug 20, 2015Updated 10 years ago
- 自动抽取网 页正文的算法,用JAVA实现☆111Apr 18, 2017Updated 8 years ago
- 分布式爬虫框架,基于webdrvier模拟用户请求,kafka消息传递,分布式网页存储使用hbase,task异步任务多线程解析,提供基础服务如:proxy ip服务和号码验证服务等, proxy page使用H5和we版进行接入☆13Dec 18, 2015Updated 10 years ago
- This is a Kubernetes deployment for Ansible's AWX. Work in progress.☆14Feb 20, 2019Updated 7 years ago
- alibaba/Sentinel zuul integration sample☆11Oct 20, 2018Updated 7 years ago
- Greenplum docs in Chinese☆17Jan 24, 2018Updated 8 years ago
- presto's elasticsearch connector☆11Dec 7, 2016Updated 9 years ago
- Just a DEMO to demonstrate how to use JNA to type chars into alipay's password edit control automatically.☆12Dec 21, 2017Updated 8 years ago
- a large-scale graph database created as a combination of multiple taxonomy backbones extracted from 5 existing knowledge graphs, namely: …☆14Jan 23, 2024Updated 2 years ago
- 《分布式实时计算框架原理及实践案例》一书中相关章节实例介绍☆11Jul 11, 2016Updated 9 years ago
- Blog crawler for the blogforever project.☆23Jan 31, 2014Updated 12 years ago
- Apache Hadoop 3 Quick Start Guide, published by Packt☆14Apr 14, 2023Updated 2 years ago