explosion / spacy-pkuseg
pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation
☆57Updated 6 months ago
Alternatives and similar repositories for spacy-pkuseg:
Users that are interested in spacy-pkuseg are comparing it to the libraries listed below
- 各大中文分词性能评测☆157Updated 6 years ago
- CINO: Pre-trained Language Models for Chinese Minority (少数民族语言预训练模型)☆233Updated 2 years ago
- 基于 g2pW 提升 pypinyin 的准确性☆86Updated last year
- Grapheme-to-Phoneme lexicons for Chinese dialects☆67Updated 2 years ago
- A convenient Chinese word segmentation tool 简便中文分词器☆46Updated 2 months ago
- PERT: Pre-training BERT with Permuted Language Model☆359Updated last year
- A large high-quality corpus of Chinese synonyms 一个大型、高质量的中文同义词语料库。☆43Updated 3 years ago
- 中文纠错☆92Updated 3 years ago
- 使用 pinyin-data 和 phrase-pinyin-data 中的拼音数据文件覆盖 pypinyin 中的内置拼音数据☆56Updated 2 months ago
- Dynamic Voice Actor Assignment and Emotional Narration for Realistic Story Play☆40Updated last month
- 基于bert进行中文文本纠错☆232Updated last year
- LERT: A Linguistically-motivated Pre-trained Language Model(语言学信息增强的预训练模型LERT)☆204Updated last year
- ☆172Updated 2 years ago
- 人民日报1998年1-4月中文标注语料库☆30Updated 6 years ago
- python | 高效使用统计语言模型kenlm:新词发现、分词、智能纠错等☆163Updated 5 years ago
- 中文、分词、词表、核心词典、事件词表、停用词、敏感词、问答、问答数据、知识图谱、文本语料。☆159Updated 3 years ago
- ✨ Split text by languages (e.g. 你喜欢看アニメ吗 -> 你喜欢看 | アニメ | 吗) for NLP tasks (e.g. parse, TTS). Powered by fasttext and budoux☆49Updated last month
- Self complemented Pinyin2Chinese demo use algorithms including Trie and HMM model , 基于隐马尔科夫模型与Trie树的拼音切分与拼音转中文的简单demo实现。☆86Updated 6 years ago
- 汉字字符特征提取工具,可以提取出字符中的字音(声母、韵母、声调)、字形(偏旁、部首)、四角编码等特征,同时可作为tensor输入到模型☆134Updated 4 years ago
- MiniRBT (中文小型预训练模型系列)☆269Updated last year
- 时间抽取、解析、标准化工具☆51Updated 2 years ago
- 🌈 NERpy: Implementation of Named Entity Recognition using Python. 命名实体识别工具,支持BertSoftmax、BertSpan等模型,开箱即用。☆113Updated last year
- 近代汉语语料库数据集 自然语言处理 语料库 古代汉语 古汉语 文言文 数字人文 计算语言☆153Updated 3 weeks ago
- 中文繁体和简体字符对照表☆40Updated 2 months ago
- Pytorch model for https://github.com/imcaspar/gpt2-ml☆79Updated 3 years ago
- 基于sentence-transformers实现文本转向量的机器人☆45Updated 2 years ago
- 中文标点符号模型,可以给文本添加标点符号。☆140Updated 3 months ago
- Code for chinese error detection module, using n-gram and bi-lstm☆135Updated 5 years ago
- ☆75Updated 2 years ago
- 古文现代文翻译平行语料库☆101Updated 3 years ago