pyunits / pyunit-sogou
搜狗词库下载模块
☆18Updated last year
Alternatives and similar repositories for pyunit-sogou:
Users that are interested in pyunit-sogou are comparing it to the libraries listed below
- 新词发现 基于词频、凝聚系数和左右邻接信息熵☆123Updated 5 years ago
- This is a corpus of Chinese abbreviation, including negative full forms.☆193Updated 3 years ago
- 中文命名实体识别(公司名称),Tensorflow 1.3 + Python3☆38Updated 7 years ago
- E-Commerce Sentiment Dict☆128Updated 6 years ago
- AC自动机python的实现,并进行了优化。 主要修复了 查询不准确的问题。☆73Updated 3 years ago
- 短文本相似度☆103Updated 3 years ago
- A curated list of resources of chinese corpora for NLP(Natural Language Processing)☆74Updated 5 years ago
- self complemented SpellCorrection based pinyin similairity, edit distance ,基于拼音相似度与编辑距离的查询纠错。☆82Updated 2 years ago
- 使用python实现了一个简单的trie树结构,可增加/查找/删除关键词,用于中文文本的关键词匹配、停用词删除等。☆64Updated 4 years ago
- 新词发现算法(NewWordDetection)☆92Updated 3 years ago
- SmoothNLP领域词汇示例 - 基于复旦公开新闻资讯库☆49Updated 5 years ago
- Self complemented Pinyin2Chinese demo use algorithms including Trie and HMM model , 基于隐马尔科夫模型与Trie树的拼音切分与拼音转中文的简单demo实现。☆86Updated 6 years ago
- 速度更快、效果更好的中文新词发现☆512Updated 11 months ago
- 文本分类基准测试☆25Updated 6 years ago
- 中文预训练XLNet模型: Pre-Trained Chinese XLNet_Large☆229Updated 5 years ago
- 中文短文句相似读☆137Updated 6 years ago
- 各大中文分词性能评测☆156Updated 6 years ago
- 今日头条中文新闻文本(多层)分类数据集☆396Updated 3 years ago
- 基于字符训练词向量☆88Updated 6 years ago
- 汉字字符特征提取工具,可以提取出字符中的字音(声母、韵母、声调)、字形(偏旁、部首)、四角编码等特征,同时可作为tensor输入到模型☆134Updated 4 years ago
- 微调预训练语言模型(BERT、Roberta、XLBert等),用于计算两个文本之间的相似度(通过句子对分类任务转换),适用于中文文本☆89Updated 4 years ago
- Code for chinese error detection module, using n-gram and bi-lstm☆135Updated 5 years ago
- siamese dssm sentence_similarity sentece_similarity_rank tensorflow☆60Updated 6 years ago
- cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information☆273Updated last year
- 一个中文的已标注词性的语料库☆201Updated 10 years ago
- 用户评论标签挖掘☆71Updated 7 years ago
- NLP NER datasets video/music/book bio☆88Updated 4 years ago
- Time-NLP的Python3版本 中文时间表达识别☆88Updated 4 years ago
- 中文、分词、词表、核心词典、事件词表、停用词、敏感词、问答、问答数据、知识图谱、文本语料。☆159Updated 3 years ago
- 基于gensim模块的中文句子相似度计算☆53Updated 6 years ago