CreekLou / simhash
An efficient algorithm for text similarity computation
☆60Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for simhash
- A simple implementation of simhash algorithm by java.☆154Updated 4 years ago
- Simhash Java单机实现☆106Updated 2 years ago
- Chinese Word Segmentation Tool, THULAC的Java实现.☆85Updated 3 years ago
- Tree-split 搬新家..给各位带来的不便深表歉意☆57Updated 8 years ago
- a word2vec impl of Chinese language, based on deeplearning4j and ansj☆28Updated 3 years ago
- 自动抽取网页正文的算法,用JAVA实现☆107Updated 7 years ago
- ☆24Updated 7 years ago
- ltp4j: Language Technology Platform For Java☆162Updated 3 years ago
- 使用Spark NaiveBayes 实现中文文本分类 use spark NaiveBayes for text classifi…☆25Updated 6 years ago
- The missing SVM-based text classification module implementing HanLP's interface☆47Updated 6 years ago
- recommend system study☆67Updated 11 years ago
- mltk web edition☆40Updated 8 years ago
- HanLP 测试☆16Updated 7 years ago
- 基于hanlp的elasticsearch分词插件☆156Updated 3 years ago
- 通过web服务器对word分词的资源进行集中统一管理☆17Updated 7 years ago
- 基于词典的负面舆情信息评分算法。☆25Updated 9 years ago
- 自定制的精准短文本搜索服务☆18Updated 3 years ago
- The implementation of bloomfilter with bit set of java and redis or others what is implemented by yourself.☆106Updated 5 years ago
- HanLP中文分词Lucene插件,支持包括Solr在内的基于Lucene的系统☆297Updated 4 years ago
- 知乎爬虫,基于webmagic框架 .A java web spider base on webmagic.☆68Updated 8 years ago
- spring整合webmagic,mybatis,dungproxy☆29Updated last year
- Spider_SinaTweetCrawler, to crawl tweet content from sinaTweet. (java)☆23Updated 7 years ago
- IKAnalyzer 中文分词器☆33Updated 3 years ago
- 一款基于SQL查询ES的Java工具包,支持SQL解析DSL,支持JDBC驱动,支持和Mybatis、Spring集成☆122Updated 7 years ago
- HanLP Analysis for Elasticsearch☆89Updated 5 years ago
- FoolNLTK java version☆82Updated 5 years ago
- 中文拼写检查工具,用于对中文文本中的错误用语进行检测并给出纠正建议☆35Updated 6 years ago
- 基于Map/Reduce爬虫,可抽取各大新闻网站的新闻正文并进行分类和聚类☆76Updated 10 years ago
- elasticsearch同义词热更新插件,支持本地文件更新,http远程文件更新,修复若干bug。☆36Updated 7 years ago