yxlHuster / news-duplicatedView external linksLinks
文本去重算法,研究自推荐系统中新闻的去重,采用了雅虎的Near-duplicates and shingling算法,服务端用c实现,客户端用java实现,利用thrift框架进行通信,为了提高扩展性,去重可以在服务端实现,服务器也提供了计算的接口,方便客户端自己扩展
☆24Feb 25, 2014Updated 11 years ago
Alternatives and similar repositories for news-duplicated
Users that are interested in news-duplicated are comparing it to the libraries listed below
Sorting:
- ☆11May 2, 2017Updated 8 years ago
- 迁移工具,目标是Oracle,MySQL,SqlServer到PostgreSQL的单项迁移,PostgreSQL和大数据平台Hive,Hbase,Impala等的双向迁移。☆10Dec 3, 2014Updated 11 years ago
- 一个比Spark-Parquet还快5~100倍的存储格式☆12Feb 22, 2016Updated 9 years ago
- Java基础服务器底层架构。 -- made by alzq.zhf☆27Aug 11, 2015Updated 10 years ago
- Distributed SQL base Realtime Streaming Computation Framework On Apache Storm, Spark☆12Mar 14, 2016Updated 9 years ago
- akka学习理解,使用了maven、sbt两种构建方式,同时使用量java和scala两种语言实现。akka入门,清晰理解akka流程☆13Oct 18, 2015Updated 10 years ago
- 基于ActiveMQ的数据交换中间件☆14Aug 17, 2014Updated 11 years ago
- MySQL to NoSQL real time dataflow☆18Oct 14, 2017Updated 8 years ago
- 该项目持续更新,整理保存相关学习笔记(包括数据结构、操作系统、计算机网络、数据库、JAVA、Scala、后端、SQL&NOSQL、大数据、数据挖掘等方面知识)☆14Mar 4, 2021Updated 4 years ago
- 个性化推荐算法的通用处理框架,基于Mahout和Lucene☆18May 25, 2015Updated 10 years ago
- Apache Hudi Demo☆22Apr 24, 2025Updated 9 months ago
- 读书笔记《自己动手写网络爬虫》,自己敲的代码。主要记录了网络爬虫的基本实现,网页去重的算法,网页指纹算法,文本信息挖掘☆47Jan 9, 2015Updated 11 years ago
- 基于JAVA NIO 的轻量级消息传输框架。主要功能包括:文本消息传输、二进制文件传输、文本及二进制混合传输、消息的自定义实现加密传输算法、同步或异步传输、客户端、服务端框架内置心跳监听、服务端认证、支持网络断线客户端自动重连。☆44May 12, 2017Updated 8 years ago
- 基于java开发,功能强大、配置灵活的数据库之间同步工具,可以执行多个数据同步任务,并且可以根据cron表达式配置同步的周期和时间☆45Jul 17, 2016Updated 9 years ago
- 简单的基于新闻语料的推荐算法实现☆22Dec 16, 2016Updated 9 years ago
- Real-time analytics in Apache Flume☆51Feb 2, 2016Updated 10 years ago
- FPtree algorithm to mining frequent pattern☆20Aug 6, 2013Updated 12 years ago
- 情感分类☆25Feb 23, 2014Updated 11 years ago
- 解析Mysql binlog日志并发至Kafka☆23Nov 25, 2016Updated 9 years ago
- SamzaSQL: Streaming SQL implementation on top of Apache Samza and Apache Kafka☆29Jun 8, 2016Updated 9 years ago
- 基于Java注解的轻量级elasticsearch搜索组件,零基础上手使用,只需在搜索字段添加注解,即可实现复杂dsl的搜索,目前已支持各种搜索,父子文档,nested等复杂检索均支持,并且可以无限级联嵌套的复杂查询☆12Oct 15, 2025Updated 4 months ago
- An open-source session replay tool for single-page applications that uses AI analysis, aggregated trends, and a RAG chatbot to help devel…☆11Jan 23, 2026Updated 3 weeks ago
- 个人实现的基于Django与semantic-ui的语言计算实验平台, 功能包括自然语言综合处理,词语计算,社会热点计算,人物计算,文学画像,职位画像等社会计算功能☆29Mar 6, 2018Updated 7 years ago
- ☆40Aug 3, 2015Updated 10 years ago
- 推荐算法☆30Jun 5, 2015Updated 10 years ago
- 本项目转移到https://github.com/cocolian/cocolian-nlp☆34Jun 8, 2014Updated 11 years ago
- Baishop是一款B2C电子商务网站,可以生成通用的电子商务构建平台,您可以非常方便的开一个网上商店,在网上开展自己的生意。网站采用纯Java编写,基于JDK6.0,使用 MySQL数据库。☆30Dec 13, 2012Updated 13 years ago
- 是APEX贡献的一个基于大数据平台能力的数据开发平台,帮助企业以最小成本实现链接数据,构建和沉淀数仓模型,降低数据应用门槛,沉淀数据价值。☆12Oct 31, 2024Updated last year
- A batch-processing system base on Spring Boot and Spring Batch. 一个基于SpringBoot和SpringBatch的批处理系统。☆10Sep 10, 2018Updated 7 years ago
- Simplifies data migration between Apache Ignite clusters by relying on Apache Avro as an intermediate storage format☆13Jun 27, 2023Updated 2 years ago
- baike schema crawler for baidu baike , hudongbaike. 面向百度百科与互动百科的概念分类体系抓取脚本☆38Apr 25, 2018Updated 7 years ago
- Google Cloud Dataflow pipelines such as Identity-By-State as well as useful utility classes.☆37Aug 9, 2023Updated 2 years ago
- ☆38Jun 27, 2022Updated 3 years ago
- This Pinyin Analysis plugin is used to do conversion between Chinese characters and Pinyin.☆10Mar 28, 2019Updated 6 years ago
- hadoop中Map/Reduce使用示例,输入(DBInputFormat),输出(DBOutputFormat)为MySql数据库表、日志分析Grep、单词排序Sort...对HBase的基本操作,增、删、查、改,使用Map/Reduce批量导入数据到HBase表中..…☆14Apr 6, 2013Updated 12 years ago
- flink 10 自我学习笔记和代码☆14Jun 29, 2022Updated 3 years ago
- json或SQL语言转为flink或者spark流/批任务☆12Jun 21, 2022Updated 3 years ago
- ☆11Sep 1, 2022Updated 3 years ago
- spring boot 相关使用代码☆11May 26, 2018Updated 7 years ago