CreekLou / simhash
An efficient algorithm for text similarity computation
☆61Updated 3 years ago
Alternatives and similar repositories for simhash:
Users that are interested in simhash are comparing it to the libraries listed below
- A simple implementation of simhash algorithm by java.☆155Updated 4 years ago
- Simhash Java单机实现☆107Updated 2 years ago
- ☆24Updated 7 years ago
- Chinese Word Segmentation Tool, THULAC的Java实现.☆85Updated 3 years ago
- Tree-split 搬新家..给各位带来的不便深表歉意☆56Updated 8 years ago
- 使用Spark NaiveBayes 实现中文文本分类 use spark NaiveBayes for text classifi…☆25Updated 6 years ago
- The missing SVM-based text classification module implementing HanLP's interface☆47Updated 7 years ago
- recommend system study☆67Updated 11 years ago
- 一套涵盖核心编程,人工智能,数字图像处理,自然语言处理,推荐与搜索,云服务领域的Java框架.☆87Updated 2 years ago
- 基于词典的负面舆情信息评分算法。☆26Updated 10 years ago
- 通过web服务器对word分词的资源进行集中统一管理☆17Updated 7 years ago
- a word2vec impl of Chinese language, based on deeplearning4j and ansj☆28Updated 3 years ago
- FoolNLTK java version☆82Updated 5 years ago
- Solr book Example Code☆53Updated 2 years ago
- The implementation of bloomfilter with bit set of java and redis or others what is implemented by yourself.☆106Updated 5 years ago
- 自动抽取网页正文的算法,用JAVA实现☆107Updated 7 years ago
- 计算两个特征向量的相似度☆26Updated 5 years ago
- Document preprocessing for preparing formatted input data which is suitable for LibSVM tool.☆50Updated 7 years ago
- 基于Map/Reduce爬虫,可抽取各大新闻网站的新闻正文并进行分类和聚类☆76Updated 11 years ago
- 实现中文文本分类,支持文件、文本分类,基于多项式分布的朴素贝叶斯分类器。由于工作实际应用是二分类,加之考虑到每个分类属性都建立map存储词语向量可能引起的内存问题,所以目前只支持二分类。当然,直接复用这个结构扩展到多分类也是很容易。之所以自己写,主要原因是没有仔细研读mah…☆23Updated 8 years ago
- 敏感词过滤工具☆24Updated 11 years ago
- TextRank算法提取关键词的Java实现☆201Updated 9 years ago
- 对 ansj 编写的 Word2VEC_java 的进一步包装,同时实现了常用的词语相似度和句子相似度计算。☆180Updated 2 years ago
- ltp4j: Language Technology Platform For Java☆162Updated 3 years ago
- gecco爬虫和spring结合使用☆52Updated 7 years ago
- 自定制的精准短文本搜索服务☆18Updated 3 years ago
- 一款运行于Elasticsearch之上的中文拼音智能分词插件,支持全拼、首字母、中文混合搜索☆156Updated last year
- 基于hanlp的elasticsearch分词插件☆158Updated 3 years ago