文本去重算法,研究自推荐系统中新闻的去重,采用了雅虎的Near-duplicates and shingling算法,服务端用c实现,客户端用java实现,利用thrift框架进行通信,为了提高扩展性,去重可以在服务端实现,服务器也提供了计算的接口,方便客户端自己扩展
☆24Feb 25, 2014Updated 12 years ago
Alternatives and similar repositories for news-duplicated
Users that are interested in news-duplicated are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆11May 2, 2017Updated 8 years ago
- 情感分类☆24Feb 23, 2014Updated 12 years ago
- 一个比Spark-Parquet还快5~100倍的存储格式☆12Feb 22, 2016Updated 10 years ago
- 一款对万方论文条目进行智能推荐和生成关键词故事线的系统☆11Jun 24, 2018Updated 7 years ago
- hive sql parser☆11Aug 27, 2014Updated 11 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 迁移工具,目标是Oracle,MySQL,SqlServer到PostgreSQL的单项迁移,PostgreSQL和大数据平台Hive,Hbase,Impala等的双向迁移。☆10Dec 3, 2014Updated 11 years ago
- Java基础服务器底层架构。 -- made by alzq.zhf☆27Aug 11, 2015Updated 10 years ago
- CTF比赛,VxWorks弱hash算法,密码碰撞解析☆14Aug 9, 2018Updated 7 years ago
- Distributed SQL base Realtime Streaming Computation Framework On Apache Storm, Spark☆12Mar 14, 2016Updated 10 years ago
- Q learning and DQN☆10Mar 14, 2022Updated 4 years ago
- FPtree algorithm to mining frequent pattern☆20Aug 6, 2013Updated 12 years ago
- 化验单ocr识别,目的识别出化验项目词条、结果数值及其定位☆10Mar 12, 2022Updated 4 years ago
- 这是对word2vec的一些改进和应用。☆13May 18, 2017Updated 8 years ago
- 对于万方论文库进行数据爬取和数据清洗生成语料库的程序☆12Jun 9, 2018Updated 7 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆11Jul 1, 2019Updated 6 years ago
- MySQL to NoSQL real time dataflow☆19Oct 14, 2017Updated 8 years ago
- Session-based Recommendations with Recurrent Neural Networks☆14Dec 14, 2017Updated 8 years ago
- 基于ActiveMQ的数据交换中间件☆14Aug 17, 2014Updated 11 years ago
- 📚 A Go port for caj2pdf/caj2pdf☆10Feb 23, 2023Updated 3 years ago
- 新闻推荐系统☆11Aug 14, 2019Updated 6 years ago
- 长沙理工大学硕博学位论文 LaTeX 模板☆16Mar 4, 2024Updated 2 years ago
- DRL or Heuristic algorithms for MEC system☆14Apr 9, 2024Updated 2 years ago
- Gecco是一款用java语言开发的轻量化的易用的网络爬虫。Gecco整合了jsoup、httpclient、fastjson、spring、htmlunit、redission等优秀框架,让您只需要配置一些jquery风格的选择器就能很快的写出一个爬虫。Gecco框架有优…☆12Mar 9, 2017Updated 9 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- 基于JAVA NIO 的轻量级消息传输框架。主要功能包括:文本消息传输、二进制文件传输、文本及二进制混合传输、消息的自定义实现加密传输算法、同步或异步传输、客户端、服务端框架内置心跳监听、服务端认证、支持网络断线客户端自动重连。☆43May 12, 2017Updated 8 years ago
- 个性化推荐算法的通用处理框架,基于Mahout和Lucene☆18May 25, 2015Updated 10 years ago
- 该项目持续更新,整理保存相关学习笔记(包括数据结构、操作系统、计算机网络、数据库、JAVA、Scala、后端、SQL&NOSQL、大数据、数据挖掘等方面知识)☆14Mar 4, 2021Updated 5 years ago
- recommend system study☆66Oct 27, 2013Updated 12 years ago
- hanboDB is a high available,low latency memory database system☆38Apr 4, 2023Updated 3 years ago
- 小星星点起,谢谢哈~html5的多个video标签:截取视频源的封面图poster;增加监听视频播放状态的功能;☆10Feb 23, 2021Updated 5 years ago
- 基于java开发,功能强大、配置灵活的数据库之间同步工具,可以执行多个数据同步任务,并且可以根据cron表达式配置同步的周期和时间☆45Jul 17, 2016Updated 9 years ago
- SpringBoot整合Vue,实现前后端的分离。采用SSM框架,实现服务器端的java代码实现☆19Mar 13, 2018Updated 8 years ago
- textcnn for advertising detection,广告检测☆11Jan 12, 2024Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Real-time analytics in Apache Flume☆51Feb 2, 2016Updated 10 years ago
- h5端调起高德、腾讯、百度地图实现车载导航插件封装☆10Dec 10, 2022Updated 3 years ago
- 数据基本清洗包括日期、时间、数值、字符串、字符、金钱、数据库(mysql、postgresql、mongodb、hbase、hdfsmemcached)、加解密(md5、sha、base64、aes、rsa)、文件、http服务、正则表达式等,后期会不断更新。☆13Jul 25, 2018Updated 7 years ago
- 学习netty。 微信公众号:匠心零度【关注获取更多精彩历史】☆18Nov 2, 2019Updated 6 years ago
- 各种安全相关思维导图整理收集☆11Sep 7, 2015Updated 10 years ago
- 此文本分类项目主要面向机器学习初学者和文本分类效果测试者,项目内部含有朴素贝叶斯,余弦定理,逻辑回归多种分类算法以及mm,rmm分词器,同时从某新闻站点爬取了多个分类共6000多篇文章,以及一个中文词典。项目方便自由拓展各种分类器和分词器,并通过组装测试分类效果。☆37Sep 29, 2017Updated 8 years ago
- Apache Hudi Demo☆21Apr 24, 2025Updated 11 months ago