17zuoye / detdup
Detect duplicated items。内容排重框架。
☆11Updated 9 years ago
Alternatives and similar repositories for detdup:
Users that are interested in detdup are comparing it to the libraries listed below
- tyccl(同义词词林) is a ruby gem that provides friendly functions to analyse similarity between Chinese Words.☆46Updated 11 years ago
- A Chinese Words Segmentation Tool Based on Bayes Model☆80Updated 11 years ago
- 中文自然语言处理工具包☆86Updated 9 years ago
- A Python package for pullword.com☆86Updated 4 years ago
- Yet another Chinese word segmentation package based on character-based tagging heuristics and CRF algorithm☆245Updated 12 years ago
- My GitHub Hubot scripts.☆12Updated 9 years ago
- ☆68Updated 9 years ago
- 《基于行块分布函数的通用网页正文抽取》的Python实现方式☆30Updated 10 years ago
- 搜狗输入法细胞词库解析☆15Updated 11 years ago
- Detect duplicated items framework。内容排重框架。☆12Updated 9 years ago
- a text analyzing (match, rewrite, extract) engine (python edition)☆80Updated 7 years ago
- Thank-you-follow-me Ha Ha Ha!☆42Updated 9 years ago
- A Chinese Webpage Title Text Categorization Tool 中文网页标题分类工具(短文本分类) pure c/c++ version: https://github.com/MagnusBai/webpage_categorizati…☆20Updated 7 years ago
- ☆21Updated 6 years ago
- stan-cn-nlp: an API wrapper based on Stanford NLP packages for the convenience of Chinese users☆57Updated 8 years ago
- DayBit 是一个使用 Tornado 作为后台框架的文字交互游戏。☆12Updated 9 years ago
- 一个碎片收藏管理的工具☆8Updated 7 years ago
- A bundle of html content extraction algorithms☆121Updated 10 years ago
- [Deactived] search engine for v2ex☆140Updated 9 years ago
- Comparision analysis of words use between 1 to 80 chapters and 80 to 120 chapters of 《A Dream of Red Mansions》.☆76Updated 6 years ago
- The offline part of icytranslate(a english-chinese translate platform) ,the output of this project should be a translate model☆19Updated 7 years ago
- sina weibo crawler☆46Updated 10 years ago
- 【CC-BY-4.0】2017年冬日,一場大火之後,北京開啟「安全隱患大排查、大清理、大整治專項行動」,藉機清退大量聚居在出租公寓、工業園區等地的外來務工人口。事發突然,波及數十個村鎮級單位,上百萬人 。端傳媒與民間機構、學生志願者遠程合作,共同梳理出292條有效數據,整理出…☆26Updated 6 years ago
- SNS用户交互学习行为研究☆45Updated 10 years ago
- auto generate chinese words in huge text.☆91Updated 10 years ago
- An OCR client use Baidu API☆54Updated 7 years ago
- Distributed text analysis suite based on Celery☆95Updated 2 years ago
- java neural network☆16Updated 8 years ago
- Academic Search Engine using Scrapy, MongoDB, Lucene/Solr, Tika, Struts2, Jquery, Bootstrap, D3, CAS☆99Updated 11 years ago
- the Chinese NLP full stack toolkit☆41Updated 10 years ago