文本去重算法,研究自推荐系统中新闻的去重,采用了雅虎的Near-duplicates and shingling算法,服务端用c实现,客户端用java实现,利用thrift框架进行通信,为了提高扩展性,去重可以在服务端实现,服务器也提供了计算的接口,方便客户端自己扩展
☆24Feb 25, 2014Updated 12 years ago
Alternatives and similar repositories for news-duplicated
Users that are interested in news-duplicated are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆11May 2, 2017Updated 9 years ago
- 文档去重功能是为了解决搜索引擎的文档语义重复的问题,方法是多重哈希下的语义指纹算法。☆11Aug 17, 2013Updated 12 years ago
- 一个比Spark-Parquet还快5~100倍的存储格式☆12Feb 22, 2016Updated 10 years ago
- hive sql parser☆11Aug 27, 2014Updated 11 years ago
- Java基础服务器底层架构。 -- made by alzq.zhf☆27Aug 11, 2015Updated 10 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- akka学习理解,使用了maven、sbt两种构建方式,同时使用量java和scala两种语言实现。akka入门,清晰理解akka流程☆13Oct 18, 2015Updated 10 years ago
- Guides users through a stricter 7-step agile PM workflow Skill. Invoke when users have a rough product idea and need interactive HTML pr…☆43Mar 31, 2026Updated last month
- Distributed SQL base Realtime Streaming Computation Framework On Apache Storm, Spark☆12Mar 14, 2016Updated 10 years ago
- 今日头条as,cp,signature算法,获取热点新闻、用户文章(正文、评论)☆12Aug 13, 2019Updated 6 years ago
- FPtree algorithm to mining frequent pattern☆20Aug 6, 2013Updated 12 years ago
- 简单的基于新闻语料的推荐算法实现☆23Dec 16, 2016Updated 9 years ago
- 一个基于朴素贝叶斯算法的新闻文本分类器☆13Jan 12, 2018Updated 8 years ago
- 📚 A Go port for caj2pdf/caj2pdf☆10Feb 23, 2023Updated 3 years ago
- 新闻推荐系统☆11Aug 14, 2019Updated 6 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆12Oct 12, 2021Updated 4 years ago
- 以中国新闻网社会新闻板块为抓取对象,通过关键词来分析新闻热点事件☆19Mar 8, 2020Updated 6 years ago
- 广告推荐系统☆21Aug 12, 2014Updated 11 years ago
- 由java构建的轻量级消息队列,支持订阅和点对点模式☆34Mar 18, 2019Updated 7 years ago
- Gecco是一款用java语言开发的轻量化的易用的网络爬虫。Gecco整合了jsoup、httpclient、fastjson、spring、htmlunit、redission等优秀框架,让您只需要配置一些jquery风格的选择器就能很快的写出一个爬虫。Gecco框架有优…☆12Mar 9, 2017Updated 9 years ago
- The Tensorflow implementation of "Review-driven Answer Generation for Product-related Questions in E-commerce ", WSDM 2019.☆24Nov 5, 2022Updated 3 years ago
- 基于JAVA NIO 的轻量级消息传输框架。主要功能包括:文本消息传输、二进制文件传输、文本及二进制混合传输、消息的自定义实现加密传输算法、同步或异步传输、客户端、服务端框架内置心跳监听、服务端认证、支持网络断线客户端自动重连。☆43May 12, 2017Updated 8 years ago
- 个性化推荐算法的通用处理框架,基于Mahout和Lucene☆18May 25, 2015Updated 10 years ago
- recommend system study☆66Oct 27, 2013Updated 12 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- 小星星点起,谢谢哈~html5的多个video标签:截取视频源的封面图poster;增加监听视频播放状态的功能;☆10Feb 23, 2021Updated 5 years ago
- SpringBoot整合Vue,实现前后端的分离。采用SSM框架,实现服务器端的java代码实现☆19Mar 13, 2018Updated 8 years ago
- SamzaSQL: Streaming SQL implementation on top of Apache Samza and Apache Kafka☆30Jun 8, 2016Updated 9 years ago
- 学习netty。 微信公众号:匠心零度【关注获取更多精彩历史】☆18Nov 2, 2019Updated 6 years ago
- A Java version of ftrl algorithm☆24Apr 28, 2017Updated 9 years ago
- SkillX: Automatically Constructing Skill Knowledge Bases for Agents☆107Apr 30, 2026Updated last week
- 各种安全相关思维导图整理收集☆11Sep 7, 2015Updated 10 years ago
- 此文本分类项目主要面向机器学习初学者和文本分类效果测试者,项目内部含有朴素贝叶斯,余弦定理,逻辑回归多种分类算法以及mm,rmm分词器,同时从某新闻站点爬取了多个分类共6000多篇文章,以及一个中文词典。项目方便自由拓展各种分类器和分词器,并通过组装测试分类效果。☆37Sep 29, 2017Updated 8 years ago
- 解析Mysql binlog日志并发至Kafka☆23Nov 25, 2016Updated 9 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- 一个用于站内信息发送的消息中间件☆10Apr 7, 2015Updated 11 years ago
- golang 微信开发工具☆10Jul 10, 2018Updated 7 years ago
- covid-19 舆论和新闻的可视化平台,获得了中国计算机学会、阿里云和机器之心等举办的疫情可视化比赛铜奖。🔥☆42Mar 12, 2021Updated 5 years ago
- Multi-Engine is a Java framework for distributed parallel processing, whose kernel is Multi-Task.☆15Feb 4, 2017Updated 9 years ago
- OpenMemory 是您的个人记忆层,用于大语言模型 - 私有、便携且开源。您的记忆存储在本地,为您提供对数据的完全控制。构建具有个性化记忆的人工智能应用程序,同时保持数据安全。☆60Jun 7, 2025Updated 11 months ago
- Central旨在提供简单的用户系统,该系统包括了注册、登陆、发送短信,产生提醒等基本功能,并和基于用户系统的Mountable Engine挂接。☆14Dec 27, 2013Updated 12 years ago
- Spring Cloud Zuul routes health indicator☆11Dec 25, 2015Updated 10 years ago