lxw0109 / CJOSpiderLinks
A Spider(with and w/o Scrapy) for crawling data from China Judgements Online(中国裁判文书网).
☆21Updated 7 years ago
Alternatives and similar repositories for CJOSpider
Users that are interested in CJOSpider are comparing it to the libraries listed below
Sorting:
- 中文分词工具评估☆61Updated 2 years ago
- 使用python实现了一个简单的trie树结构,可增加/查找/删除关键词,用于中文文本的关键词匹配、停用词删除等。☆64Updated 5 years ago
- Event monitor based on online news corpus including event storyline and analysis,基于给定事件关键词,采集事件资讯,对事件进行挖掘和分析。☆152Updated 6 years ago
- 从门户网站爬取新闻的摘要-标题对使用seq2seq根据摘要生成标题☆45Updated 7 years ago
- 个人学习用。请star或fork原作者。☆27Updated 10 years ago
- 维基百科离线语料获取☆28Updated 7 years ago
- openlaw数据爬虫v1.1 更新日期:2017.12.16 解决新版openlaw多种加密问题。引入celery轻松异步分布式,爬取速度再次翻倍!!☆57Updated 6 years ago
- 中文命名实体识别(公司名称),Tensorflow 1.3 + Python3☆38Updated 7 years ago
- 无监督中文仿真评论自动生成。 Unsupervised Automatic Generation of Chinese Fake Reviews.☆83Updated 5 years ago
- Syntax and Ruler-Based Doc sentiment analysis 基于依存句法规则的篇章级情感分析demo☆107Updated 6 years ago
- 智能客服☆105Updated 5 years ago
- BosonNLP HTTP API 封装库(SDK)☆163Updated 6 years ago
- A simple and useful platform for entity tagging using tornado.☆25Updated 5 years ago
- The missing SVM-based text classification module implementing HanLP's interface☆47Updated 7 years ago
- self complemented SpellCorrection based pinyin similairity, edit distance ,基于拼音相似度与编辑距离的查询纠错。☆82Updated 3 years ago
- E-Commerce Sentiment Dict☆130Updated 6 years ago
- 短文本相似度☆103Updated 3 years ago
- self complemented BaiduIndexSpyder based on Selenium , index image decode and num image transfer,基于关键词的历时百度搜索指数自动采集☆42Updated 7 years ago
- Train Wikidata with word2vec for word embedding tasks☆123Updated 6 years ago
- 今日头条爬虫,主要爬取关键词搜索结果,包含编辑距离算法、奇异值分解、k-means聚类。☆72Updated 5 years ago
- Sequential Event Experiment based on Travel note crawled from XieCheng,基于50W携程出行游记的采集与顺承事件图谱构建.☆183Updated 6 years ago
- 用TF特征向量和simhash指纹计算中文文本的相似度☆216Updated 8 years ago
- self complemented WeiboIndexSpyder based on Selenium ,新浪微博指数(微指数)采集,包括综合指数,移动端指数,PC端指数☆31Updated 7 years ago
- 微博自动摘要系统 Chinese Microblog Automatic Summary System☆30Updated 6 years ago
- 基于标题分类的主题句提取方法可描述为: 给定一篇新闻报道, 计算标题与新闻主题词集的相似度, 判断标题是否具有提示性。对于提示性标题,抽取新闻报道中与其最相似的句子作为主题句; 否则, 综合利用多种特征计算新闻报道中句子的重要性, 将得分最高的句子作为主题句。☆40Updated 8 years ago
- 电商评论观点挖掘☆39Updated 5 years ago
- Word similarity computation based on Tongyici Cilin☆120Updated 7 years ago
- 同义词扩展☆27Updated 9 years ago
- 使用Simhash对海量文本进行去重☆12Updated 7 years ago
- 金庸小说人物关系图谱构建☆62Updated 5 years ago