explosion / spacy-pkuseg
pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation
☆53Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for spacy-pkuseg
- A large high-quality corpus of Chinese synonyms 一个大型、高质量的中文同义词语料库。☆37Updated 3 years ago
- 中文标点符号模型,可以给文本添加标点符号。☆130Updated 9 months ago
- 时间抽取、解析、标准化工具☆49Updated 2 years ago
- 基于Pytorch 1.0 实现的中文断句与标点符号恢复。☆55Updated 5 years ago
- 各大中文分词性能评测☆154Updated 5 years ago
- ☆121Updated 3 years ago
- Estimate the phonetic distance between Chinese words and get similar sounding candidate words.☆35Updated last year
- CINO: Pre-trained Language Models for Chinese Minority (少数民族语言预训练模型)☆215Updated last year
- 中文纠错☆91Updated 2 years ago
- 渊 - A project for Classical Chinese☆94Updated 2 years ago
- flow mirror models from JZX AI Labs☆40Updated last month
- ☆173Updated last year
- 中文谐音词/字库(同音词/字)Chinese Homophones☆96Updated 5 years ago
- 高性能文本 Tokenizer 库☆27Updated 9 months ago
- 基于 g2pW 提升 pypinyin 的准确性☆78Updated last year
- Grapheme-to-Phoneme lexicons for Chinese dialects☆66Updated 2 years ago
- 大规模中文语料☆38Updated 5 years ago
- 一个基于预训练的句向量生成工具☆132Updated last year
- A Bert-CNN-LSTM model for punctuation restoration☆55Updated last year
- Chinese Couplets Dataset without vulgar words. 不包含敏感内容的对联数据集。☆69Updated 4 years ago
- 人民日报1998年1-4月中文标注语料库☆29Updated 6 years ago
- A convenient Chinese word segmentation tool 简便中文分词器☆46Updated 3 months ago
- 中文文本改写☆19Updated 4 years ago
- 古文现代文翻译平行语料库☆96Updated 2 years ago
- 基于bert进行中文文本纠错☆226Updated last year
- A Python toolkit for file processing, text cleaning and data splitting. 文件处理,文本清洗和数据划分的python工具包。☆29Updated 2 years ago
- Code for chinese error detection module, using n-gram and bi-lstm☆131Updated 5 years ago
- LERT: A Linguistically-motivated Pre-trained Language Model(语言学信息增强的预训练模型LERT)☆202Updated last year
- Self complemented Pinyin2Chinese demo use algorithms including Trie and HMM model , 基于隐马尔科夫模型与Trie树的拼音切分与拼音转中文的简单demo实现。☆84Updated 6 years ago