yuikns / icwb2-dataLinks
This directory contains the training, test, and gold-standard data used in the 2nd International Chinese Word Segmentation Bakeoff. Also included is the script used to score the results submitted by the bakeoff participants and the simple segmenter used to generate the baseline and topline data.
☆68Updated 7 years ago
Alternatives and similar repositories for icwb2-data
Users that are interested in icwb2-data are comparing it to the libraries listed below
Sorting:
- 各大中文分词性能评测☆158Updated 6 years ago
- NLP NER datasets video/music/book bio☆90Updated 4 years ago
- Word similarity computation based on Tongyici Cilin☆120Updated 8 years ago
- 教育行业新闻 自动文摘 语料库 自动摘要☆200Updated 7 years ago
- 新词发现 基于词频、凝聚系数和左右邻接信息熵☆122Updated 5 years ago
- NER(命名实体识别)中文语料,一站式获取☆130Updated 5 years ago
- 依存关系分析,NLP,自然语言处理☆85Updated 3 years ago
- Simple Solution for Multi-Criteria Chinese Word Segmentation☆302Updated 4 years ago
- Code for chinese error detection module, using n-gram and bi-lstm☆135Updated 6 years ago
- Sequence labeling base on universal transformer (Transformer encoder) and CRF; 基于Universal Transformer + CRF 的中文分词和词性标注☆158Updated 6 years ago
- 基于BERT的无监督分词和句法分析☆110Updated 5 years ago
- ChineseTextualInference project including chinese corpus build and inferecence model, 中文文本推断项目,包括88万文本蕴含中文文本蕴含数据集的翻译与构建,基于深度学习的文本蕴含判定模型构建…☆174Updated 6 years ago
- 基于 TensorFlow & PaddlePaddle 的通用序列标注算法库(目前包含 BiLSTM+CRF, Stacked-BiLSTM+CRF 和 IDCNN+CRF,更多算法正在持续添加中)实现中文分词(Tokenizer / segmentation)、词性标注…☆84Updated 2 years ago
- 中文语料 Bert finetune(Fine-tune Chinese for BERT)☆81Updated 6 years ago
- 使用python实现了一个简单的trie树结构,可增加/查找/删除关键词,用于中文文本的关键词匹配、停用词删除等。☆64Updated 5 years ago
- 新词发现算法(NewWordDetection)☆62Updated 7 years ago
- chinese and english corpus process script, python, c++, java☆197Updated 6 years ago
- DistilBERT for Chinese 海量中文预训练蒸馏bert模型☆92Updated 5 years ago
- SMP2017中文人机对话评测数据☆107Updated 7 years ago
- Self complemented Pinyin2Chinese demo use algorithms including Trie and HMM model , 基于隐马尔科夫模型与Trie树的拼音切分与拼音转中文的简单demo实现。☆86Updated 7 years ago
- Bert finetune for CMRC2018, CJRC, DRCD, CHID, C3☆183Updated 5 years ago
- 将百度ernie的paddlepaddle模型转成tensorflow模型☆177Updated 5 years ago
- Neural Chinese Address Parsing☆122Updated 6 years ago
- SmoothNLP领域词汇示例 - 基于复旦公开新闻资讯库☆49Updated 5 years ago
- 基于ltp的简单评论观点抽取模块☆116Updated 6 years ago
- 汉字字符特征提取工具,可以提取出字符中的字音(声母、韵母、声调)、字形(偏旁、部首)、四角编码等特征,同时可作为tensor输入到模型☆136Updated 5 years ago
- 基于最小熵原理的NLP工具包☆137Updated 3 years ago
- 基于 Bi-LSTM 和 CRF 的中文语义角色标注☆87Updated 6 years ago
- python CRF++实现分词☆37Updated 7 years ago
- 新词发现算法(NewWordDetection)☆92Updated 4 years ago