文本去重算法,研究自推荐系统中新闻的去重,采用了雅虎的Near-duplicates and shingling算法,服务端用c实现,客户端用java实现,利用thrift框架进行通信,为了提高扩展性,去重可以在服务端实现,服务器也提供了计算的接口,方便客户端自己扩展
☆24Feb 25, 2014Updated 12 years ago
Alternatives and similar repositories for news-duplicated
Users that are interested in news-duplicated are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆11May 2, 2017Updated 8 years ago
- 文档去重功能是为了解决搜索引擎的文档语义重复的问题,方法是多重哈希下的语义指纹算法。☆12Aug 17, 2013Updated 12 years ago
- 情感分类☆24Feb 23, 2014Updated 12 years ago
- 一款对万方论文条目进行智能推荐和生成关键词故事线的系统☆11Jun 24, 2018Updated 7 years ago
- hive sql parser☆11Aug 27, 2014Updated 11 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- 迁移工具,目标是Oracle,MySQL,SqlServer到PostgreSQL的单项迁移,PostgreSQL和大数据平台Hive,Hbase,Impala等的双向迁移。☆10Dec 3, 2014Updated 11 years ago
- akka学习理解,使用了maven、sbt两种构建方式,同时使用量java和scala两种语言实现。akka入门,清晰理解akka流程☆13Oct 18, 2015Updated 10 years ago
- Distributed SQL base Realtime Streaming Computation Framework On Apache Storm, Spark☆12Mar 14, 2016Updated 10 years ago
- Q learning and DQN☆10Mar 14, 2022Updated 4 years ago
- FPtree algorithm to mining frequent pattern☆20Aug 6, 2013Updated 12 years ago
- In group learning☆10Jul 30, 2019Updated 6 years ago
- 化验单ocr识别,目的识别出化验项目词条、结果数值及其定位☆10Mar 12, 2022Updated 4 years ago
- 这是对word2vec的一些改进和应用。☆13May 18, 2017Updated 8 years ago
- MySQL to NoSQL real time dataflow☆19Oct 14, 2017Updated 8 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Session-based Recommendations with Recurrent Neural Networks☆14Dec 14, 2017Updated 8 years ago
- 基于ActiveMQ的数据交换中间件☆14Aug 17, 2014Updated 11 years ago
- 新闻推荐系统☆11Aug 14, 2019Updated 6 years ago
- 这里将paddle中的ocr等模型转为onnx格式,并利用java版深度框架djl加载这些onnx模型进行推理预测尝试。☆13Nov 15, 2022Updated 3 years ago
- 长沙理工大学硕博学位论文 LaTeX 模板☆16Mar 4, 2024Updated 2 years ago
- ☆12Oct 12, 2021Updated 4 years ago
- 基于nginx lua做前端防御,基于hadoop做用户行为分析的waf☆11Nov 17, 2016Updated 9 years ago
- Gecco是一款用java语言开发的轻量化的易用的网络爬虫。Gecco整合了jsoup、httpclient、fastjson、spring、htmlunit、redission等优秀框架,让您只需要配置一些jquery风格的选择器就能很快的写出一个爬虫。Gecco框架有优…☆12Mar 9, 2017Updated 9 years ago
- The Tensorflow implementation of "Review-driven Answer Generation for Product-related Questions in E-commerce ", WSDM 2019.☆24Nov 5, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- 基于JAVA NIO 的轻量级消息传输框架。主要功能包括:文本消息传输、二进制文件传输、文本及二进制混合传输、消息的自定义实现加密传输算法、同步或异步传输、客户端、服务端框架内置心跳监听、服务端认证、支持网络断线客户端自动重连。☆43May 12, 2017Updated 8 years ago
- recommend system study☆66Oct 27, 2013Updated 12 years ago
- 基于java开发,功能强大、配置灵活的数据库之间同步工具,可以执行多个数据同步任务,并且可以根据cron表达式配置同步的周期和时间☆45Jul 17, 2016Updated 9 years ago
- SpringBoot整合Vue,实现前后端的分离。采用SSM框架,实现服务器端的java代码实现☆19Mar 13, 2018Updated 8 years ago
- textcnn for advertising detection,广告检测☆11Jan 12, 2024Updated 2 years ago
- Real-time analytics in Apache Flume☆51Feb 2, 2016Updated 10 years ago
- h5端调起高德、腾讯、百度地图实现车载导航插件封装☆10Dec 10, 2022Updated 3 years ago
- SamzaSQL: Streaming SQL implementation on top of Apache Samza and Apache Kafka☆29Jun 8, 2016Updated 9 years ago
- 数据基本清洗包括日期、时间、数值、字符串、字符、金钱、数据库(mysql、postgresql、mongodb、hbase、hdfsmemcached)、加解密(md5、sha、base64、aes、rsa)、文件、http服务、正则表达式等,后期会不断更新。☆13Jul 25, 2018Updated 7 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Apache Hudi Demo☆21Apr 24, 2025Updated 11 months ago
- OpenMemory 是您的个人记忆层,用于大语言模型 - 私有、便携且开源。您的记忆存储在本地,为您提供对数据的完全控制。构建具有个性化记忆的人工智能应用程序,同时保持数据安全。☆59Jun 7, 2025Updated 9 months ago
- 解析Mysql binlog日志并发至Kafka☆23Nov 25, 2016Updated 9 years ago
- 一个用于站内信息发送的消息中间件☆10Apr 7, 2015Updated 10 years ago
- golang 微信开发工具☆10Jul 10, 2018Updated 7 years ago
- Proof of concept prototype to perform distributed training using BVLC/caffe, based on a parameter server implementation using MPI. Data p…☆13May 7, 2015Updated 10 years ago
- 灵犀模驱 – 开源的低代码模型驱动开发框架☆13Dec 10, 2022Updated 3 years ago