文本去重算法,研究自推荐系统中新闻的去重,采用了雅虎的Near-duplicates and shingling算法,服务端用c实现,客户端用java实现,利用thrift框架进行通信,为了提高扩展性,去重可以在服务端实现,服务器也提供了计算的接口,方便客户端自己扩展
☆24Feb 25, 2014Updated 12 years ago
Alternatives and similar repositories for news-duplicated
Users that are interested in news-duplicated are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆11May 2, 2017Updated 9 years ago
- 文档去重功能是为了解决搜索引擎的文档语义重复的问题,方法是多重哈希下的语义指纹算法。☆11Aug 17, 2013Updated 12 years ago
- 情感分类☆24Feb 23, 2014Updated 12 years ago
- 一个比Spark-Parquet还快5~100倍的存储格式☆12Feb 22, 2016Updated 10 years ago
- 一款对万方论文条目进行智能推荐和生成关键词故事线的系统☆11Jun 24, 2018Updated 7 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- hive sql parser☆11Aug 27, 2014Updated 11 years ago
- 迁移工具,目标是Oracle,MySQL,SqlServer到PostgreSQL的单项迁移,PostgreSQL和大数据平台Hive,Hbase,Impala等的双向迁移。☆10Dec 3, 2014Updated 11 years ago
- Java基础服务器底层架构。 -- made by alzq.zhf☆27Aug 11, 2015Updated 10 years ago
- Distributed SQL base Realtime Streaming Computation Framework On Apache Storm, Spark☆12Mar 14, 2016Updated 10 years ago
- Python脚本实现千万级文本数据快速去重☆19Mar 14, 2016Updated 10 years ago
- FPtree algorithm to mining frequent pattern☆20Aug 6, 2013Updated 12 years ago
- 这是对word2vec的一些改进和应用。☆13May 18, 2017Updated 9 years ago
- MySQL to NoSQL real time dataflow☆19Oct 14, 2017Updated 8 years ago
- 📚 A Go port for caj2pdf/caj2pdf☆10Feb 23, 2023Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 新闻推荐系统☆11Aug 14, 2019Updated 6 years ago
- spring boot 相关使用代码☆11May 26, 2018Updated 8 years ago
- ☆12Oct 12, 2021Updated 4 years ago
- 以中国新闻网社会新闻板块为抓取对象,通过关键词来分析新闻热点事件☆19Mar 8, 2020Updated 6 years ago
- 基于nginx lua做前端防御,基于hadoop做用户行为分析的waf☆11Nov 17, 2016Updated 9 years ago
- 由java构建的轻量级消息队列,支持订阅和点对点模式☆34Mar 18, 2019Updated 7 years ago
- The Tensorflow implementation of "Review-driven Answer Generation for Product-related Questions in E-commerce ", WSDM 2019.☆24Nov 5, 2022Updated 3 years ago
- Gecco是一款用java语言开发的轻量化的易用的网络爬虫。Gecco整合了jsoup、httpclient、fastjson、spring、htmlunit、redission等优秀框架,让您只需要配置一些jquery风格的选择器就能很快的写出一个爬虫。Gecco框架有优…☆12Mar 9, 2017Updated 9 years ago
- 基于JAVA NIO 的轻量级消息传输框架。主要功能包括:文本消息传输、二进制文件传输、文本及二进制混合传输、消息的自定义实现加密传输算法、同步或异步传输、客户端、服务端框架内置心跳监听、服务端认证、支持网络断线客户端自动重连。☆43May 12, 2017Updated 9 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- 个性化推荐算法的通用处理框架,基于Mahout和Lucene☆18May 25, 2015Updated 11 years ago
- recommend system study☆66Oct 27, 2013Updated 12 years ago
- hanboDB is a high available,low latency memory database system☆38Apr 4, 2023Updated 3 years ago
- 基于java开发,功能强大、配置灵活的数据库之间同步工具,可以执行多个数据同步任务,并且可以根据cron表达式配置同步的周期和时间☆45Jul 17, 2016Updated 9 years ago
- Real-time analytics in Apache Flume☆51Feb 2, 2016Updated 10 years ago
- h5端调起高德、腾讯、百度地图实现车载导航插件封装☆10Dec 10, 2022Updated 3 years ago
- SamzaSQL: Streaming SQL implementation on top of Apache Samza and Apache Kafka☆30Jun 8, 2016Updated 9 years ago
- 学习netty。 微信公众号:匠心零度【关注获取更多精彩历史】☆18Nov 2, 2019Updated 6 years ago
- A Java version of ftrl algorithm☆24Apr 28, 2017Updated 9 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- 各种安全相关思维导图整理收集☆11Sep 7, 2015Updated 10 years ago
- Apache Hudi Demo☆21Apr 24, 2025Updated last year
- Bindings to CoreVideo.framework for macOS and iOS☆16Jan 4, 2022Updated 4 years ago
- golang 微信开发工具☆10Jul 10, 2018Updated 7 years ago
- Multi-Engine is a Java framework for distributed parallel processing, whose kernel is Multi-Task.☆15Feb 4, 2017Updated 9 years ago
- OpenMemory 是您的个人记忆层,用于大语言模型 - 私有、便携且开源。您的记忆存储在本地,为您提供对数据的完全控制。构建具有个性化记忆的人工智能应用程序,同时保持数据安全。☆61Jun 7, 2025Updated 11 months ago
- Central旨在提供简单的用户系统,该系统包括了注册、登陆、发送短信,产生提醒等基本功能,并和基于用户系统的Mountable Engine挂接。☆14Dec 27, 2013Updated 12 years ago