文本去重算法,研究自推荐系统中新闻的去重,采用了雅虎的Near-duplicates and shingling算法,服务端用c实现,客户端用java实现,利用thrift框架进行通信,为了提高扩展性,去重可以在服务端实现,服务器也提供了计算的接口,方便客户端自己扩展
☆24Feb 25, 2014Updated 12 years ago
Alternatives and similar repositories for news-duplicated
Users that are interested in news-duplicated are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆10May 2, 2017Updated 9 years ago
- 文档去重功能是为了解决搜索引擎的文档语义重复的问题,方法是多重哈希下的语义指纹算法。☆11Aug 17, 2013Updated 12 years ago
- 情感分类☆24Feb 23, 2014Updated 12 years ago
- 一个比Spark-Parquet还快5~100倍的存储格式☆12Feb 22, 2016Updated 10 years ago
- 一款对万方论文条目进行智能推荐和生成关键词故事线的系统☆11Jun 24, 2018Updated 7 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- hive sql parser☆11Aug 27, 2014Updated 11 years ago
- 迁移工具,目标是Oracle,MySQL,SqlServer到PostgreSQL的单项迁移,PostgreSQL和大数据平台Hive,Hbase,Impala等的双向迁移。☆10Dec 3, 2014Updated 11 years ago
- Java基础服务器底层架构。 -- made by alzq.zhf☆27Aug 11, 2015Updated 10 years ago
- akka学习理解,使用了maven、sbt两种构建方式,同时使用量java和scala两种语言实现。akka入门,清晰理解akka流程☆13Oct 18, 2015Updated 10 years ago
- CTF比赛,VxWorks弱hash算法,密码碰撞解析☆14Aug 9, 2018Updated 7 years ago
- Guides users through a stricter 7-step agile PM workflow Skill. Invoke when users have a rough product idea and need interactive HTML pr…☆59Mar 31, 2026Updated 2 months ago
- Distributed SQL base Realtime Streaming Computation Framework On Apache Storm, Spark☆12Mar 14, 2016Updated 10 years ago
- Q learning and DQN☆10Mar 14, 2022Updated 4 years ago
- FPtree algorithm to mining frequent pattern☆20Aug 6, 2013Updated 12 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- 化验单ocr识别,目的识别出化验项目词条、结果数值及其定位☆10Mar 12, 2022Updated 4 years ago
- 对于万方论文库进行数据爬取和数据清洗生成语料库的程序☆13Jun 9, 2018Updated 8 years ago
- ☆11Jul 1, 2019Updated 6 years ago
- 简单的基于新闻语料的推荐算法实现☆23Dec 16, 2016Updated 9 years ago
- 基于ActiveMQ的数据交换中间件☆14Aug 17, 2014Updated 11 years ago
- 长沙理工大学硕博学位论文 LaTeX 模板☆17Mar 4, 2024Updated 2 years ago
- 这里将paddle中的ocr等模型转为onnx格式,并利用java版深度框架djl加载这些onnx模型进行推理预测尝试。☆14Nov 15, 2022Updated 3 years ago
- ☆12Oct 12, 2021Updated 4 years ago
- 广告推荐系统☆21Aug 12, 2014Updated 11 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- DRL or Heuristic algorithms for MEC system☆14Apr 9, 2024Updated 2 years ago
- 由java构建的轻量级消息队列,支持订阅和点对点模式☆34Mar 18, 2019Updated 7 years ago
- The Tensorflow implementation of "Review-driven Answer Generation for Product-related Questions in E-commerce ", WSDM 2019.☆24Nov 5, 2022Updated 3 years ago
- 基于JAVA NIO 的轻量级消息传输框架。主要功能包括:文本消息传输、二进制文件传输、文本及二进制混合传输、消息的自定义实现加密传输算法、同步或异步传输、客户端、服务端框架内置心跳监听、服务端认证、支持网络断线客户端自动重连。☆44May 12, 2017Updated 9 years ago
- 该项目持续更新,整理保存相关学习笔记(包括数据结构、操作系统、计算机网络、数据库、JAVA、Scala、后端、SQL&NOSQL、大数据、数据挖掘等方面知识)☆14Mar 4, 2021Updated 5 years ago
- recommend system study☆66Oct 27, 2013Updated 12 years ago
- hanboDB is a high available,low latency memory database system☆38Apr 4, 2023Updated 3 years ago
- 小星星点起,谢谢哈~html5的多个video标签:截取视频源的封面图poster;增加监听视频播放状态的功能;☆10Feb 23, 2021Updated 5 years ago
- SpringBoot整合Vue,实现前后端的分离。采用SSM框架,实现服务器端的java代码实现☆19Mar 13, 2018Updated 8 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 基于java开发,功能强大、配置灵活的数据库之间同步工具,可以执行多个数据同步任务,并且可以根据cron表达式配置同步的周期和时间☆46Jul 17, 2016Updated 9 years ago
- h5端调起高德、腾讯、百度地图实现车载导航插件封装☆10Dec 10, 2022Updated 3 years ago
- SamzaSQL: Streaming SQL implementation on top of Apache Samza and Apache Kafka☆30Jun 8, 2016Updated 10 years ago
- DEPRECATED: Simple, fast user news feeds for Django☆52Jan 2, 2019Updated 7 years ago
- 数据基本清洗包括日期、时间、数值、字符串、字符、金钱、数据库(mysql、postgresql、mongodb、hbase、hdfsmemcached)、加解密(md5、sha、base64、aes、rsa)、文件、http服务、正则表达式等,后期会不断更新。☆13Jul 25, 2018Updated 7 years ago
- 学习netty。 微信公众号:匠心零度【关注获取更多精彩历史】☆18Nov 2, 2019Updated 6 years ago
- 各种安全相关思维导图整理收集☆11Sep 7, 2015Updated 10 years ago