yxlHuster/news-duplicated

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yxlHuster/news-duplicated)

yxlHuster / news-duplicated

文本去重算法，研究自推荐系统中新闻的去重，采用了雅虎的Near-duplicates and shingling算法，服务端用c实现，客户端用java实现，利用thrift框架进行通信，为了提高扩展性，去重可以在服务端实现，服务器也提供了计算的接口，方便客户端自己扩展

☆24

Alternatives and similar repositories for news-duplicated

Users that are interested in news-duplicated are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Quincy1994 / MachineLearning
View on GitHub
☆10May 2, 2017Updated 9 years ago
GuohuaZhuang / deduplication-detecting
View on GitHub
文档去重功能是为了解决搜索引擎的文档语义重复的问题，方法是多重哈希下的语义指纹算法。
☆11Aug 17, 2013Updated 12 years ago
yxlHuster / Sentiment
View on GitHub
情感分类
☆24Feb 23, 2014Updated 12 years ago
snowlixue / wanfangPaperSystem
View on GitHub
一款对万方论文条目进行智能推荐和生成关键词故事线的系统
☆11Jun 24, 2018Updated 8 years ago
ycloudnet / ya100
View on GitHub
一个比Spark-Parquet还快5~100倍的存储格式
☆12Feb 22, 2016Updated 10 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
smallbaby / hql-parser
View on GitHub
hive sql parser
☆11Aug 27, 2014Updated 11 years ago
chenzhenyang / aquila
View on GitHub
迁移工具，目标是Oracle，MySQL，SqlServer到PostgreSQL的单项迁移，PostgreSQL和大数据平台Hive，Hbase，Impala等的双向迁移。
☆10Dec 3, 2014Updated 11 years ago
NewBee119 / ctf_vxworks
View on GitHub
CTF比赛，VxWorks弱hash算法，密码碰撞解析
☆14Aug 9, 2018Updated 7 years ago
duoan / codes-scratch-akka
View on GitHub
akka学习理解，使用了maven、sbt两种构建方式，同时使用量java和scala两种语言实现。akka入门，清晰理解akka流程
☆13Oct 18, 2015Updated 10 years ago
itiki / PythonTo-repeat-the-text-Bigdata
View on GitHub
Python脚本实现千万级文本数据快速去重
☆19Mar 14, 2016Updated 10 years ago
CSharpYDS / edge-computing-Q-learning
View on GitHub
Q learning and DQN
☆10Mar 14, 2022Updated 4 years ago
dk-stationery / stationery-ink
View on GitHub
Distributed SQL base Realtime Streaming Computation Framework On Apache Storm, Spark
☆12Mar 14, 2016Updated 10 years ago
BigPeng / FPtree
View on GitHub
FPtree algorithm to mining frequent pattern
☆20Aug 6, 2013Updated 12 years ago
snowlixue / wanFangSpider-dataPretreatment
View on GitHub
对于万方论文库进行数据爬取和数据清洗生成语料库的程序
☆13Jun 9, 2018Updated 8 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
fjiangAI / Word2Vec
View on GitHub
这是对word2vec的一些改进和应用。
☆13May 18, 2017Updated 9 years ago
ziapple / DBExchange
View on GitHub
基于ActiveMQ的数据交换中间件
☆14Aug 17, 2014Updated 11 years ago
nafg / mill-bundler
View on GitHub
Javascript module resolution and bundling for the Mill build tool
☆12Feb 17, 2026Updated 5 months ago
rees46 / rnn_recommendations
View on GitHub
Session-based Recommendations with Recurrent Neural Networks
☆14Dec 14, 2017Updated 8 years ago
w4ngzhen / intellij-jcef-plugin
View on GitHub
☆12Oct 12, 2021Updated 4 years ago
jiangnanboy / doc_ai
View on GitHub
这里将paddle中的ocr等模型转为onnx格式，并利用java版深度框架djl加载这些onnx模型进行推理预测尝试。
☆14Nov 15, 2022Updated 3 years ago
xiexikang / html5-video-poster
View on GitHub
小星星点起，谢谢哈~html5的多个video标签：截取视频源的封面图poster；增加监听视频播放状态的功能；
☆10Feb 23, 2021Updated 5 years ago
Grox888 / Mobile_Edge_Computing
View on GitHub
DRL or Heuristic algorithms for MEC system
☆14Apr 9, 2024Updated 2 years ago
kangxiatao / CSUSTthesis
View on GitHub
长沙理工大学硕博学位论文 LaTeX 模板
☆17Mar 4, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
stalary / lightMQ
View on GitHub
由java构建的轻量级消息队列，支持订阅和点对点模式
☆34Mar 18, 2019Updated 7 years ago
NanGePlus / PydanticAITest
View on GitHub
PydanticAI开源框架，搭建基于PostgreSQL、MySQL的Text2SQL应用进行SQL语句生成，支持GPT大模型、国产大模型、开源本地大模型
☆17Dec 26, 2024Updated last year
qzw1210 / geeco
View on GitHub
Gecco是一款用java语言开发的轻量化的易用的网络爬虫。Gecco整合了jsoup、httpclient、fastjson、spring、htmlunit、redission等优秀框架，让您只需要配置一些jquery风格的选择器就能很快的写出一个爬虫。Gecco框架有优…
☆12Mar 9, 2017Updated 9 years ago
qumingxing / framework-nio
View on GitHub
基于JAVA NIO 的轻量级消息传输框架。主要功能包括：文本消息传输、二进制文件传输、文本及二进制混合传输、消息的自定义实现加密传输算法、同步或异步传输、客户端、服务端框架内置心跳监听、服务端认证、支持网络断线客户端自动重连。
☆44May 12, 2017Updated 9 years ago
drogba321 / easy-recommender
View on GitHub
个性化推荐算法的通用处理框架，基于Mahout和Lucene
☆18May 25, 2015Updated 11 years ago
ljcan / Java_Review
View on GitHub
该项目持续更新，整理保存相关学习笔记（包括数据结构、操作系统、计算机网络、数据库、JAVA、Scala、后端、SQL&NOSQL、大数据、数据挖掘等方面知识）
☆14Mar 4, 2021Updated 5 years ago
WHUIR / RAGE
View on GitHub
The Tensorflow implementation of "Review-driven Answer Generation for Product-related Questions in E-commerce ", WSDM 2019.
☆24Nov 5, 2022Updated 3 years ago
dgyuanjun / SpringBoot-Vue
View on GitHub
SpringBoot整合Vue，实现前后端的分离。采用SSM框架，实现服务器端的java代码实现
☆19Mar 13, 2018Updated 8 years ago
awnuxkjy / recommend-system
View on GitHub
recommend system study
☆66Oct 27, 2013Updated 12 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
will110 / flyutils
View on GitHub
工具助手
☆14Oct 14, 2019Updated 6 years ago
GongDexing / database-sync
View on GitHub
基于java开发，功能强大、配置灵活的数据库之间同步工具，可以执行多个数据同步任务，并且可以根据cron表达式配置同步的周期和时间
☆46Jul 17, 2016Updated 10 years ago
milinda / samza-sql
View on GitHub
SamzaSQL: Streaming SQL implementation on top of Apache Samza and Apache Kafka
☆30Jun 8, 2016Updated 10 years ago
0xqq / ETL-1
View on GitHub
数据基本清洗包括日期、时间、数值、字符串、字符、金钱、数据库（mysql、postgresql、mongodb、hbase、hdfsmemcached）、加解密（md5、sha、base64、aes、rsa）、文件、http服务、正则表达式等，后期会不断更新。
☆13Jul 25, 2018Updated 8 years ago
jiangxinlingdu / nettydemo
View on GitHub
学习netty。微信公众号：匠心零度【关注获取更多精彩历史】
☆18Nov 2, 2019Updated 6 years ago
sunpeak / Mind-Map
View on GitHub
各种安全相关思维导图整理收集
☆11Sep 7, 2015Updated 10 years ago
fmyblack / textClassify
View on GitHub
此文本分类项目主要面向机器学习初学者和文本分类效果测试者，项目内部含有朴素贝叶斯，余弦定理，逻辑回归多种分类算法以及mm，rmm分词器，同时从某新闻站点爬取了多个分类共6000多篇文章，以及一个中文词典。项目方便自由拓展各种分类器和分词器，并通过组装测试分类效果。
☆37Sep 29, 2017Updated 8 years ago