explosion / spacy-pkusegLinks
pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation
☆62Updated this week
Alternatives and similar repositories for spacy-pkuseg
Users that are interested in spacy-pkuseg are comparing it to the libraries listed below
Sorting:
- 渊 - A project for Classical Chinese☆105Updated 3 years ago
- A large high-quality corpus of Chinese synonyms 一个大型、高质量的中文同义词语料库。☆58Updated 3 years ago
- A convenient Chinese word segmentation tool 简便中文分词器☆46Updated 2 months ago
- CINO: Pre-trained Language Models for Chinese Minority (少数民族语言预训练模型)☆247Updated 2 years ago
- 最好的汉字数字(中文数字)-阿拉伯数字转换工具。包含"点二八","负百分之四十"等众多汉语表达方法。NLP,机器人工程必备! The Best Tool of Chinese Number to Digits☆367Updated 2 years ago
- 各大中文分词性能评测☆158Updated 6 years ago
- SuperCLUE琅琊榜:中文通用大模型匿名对战评价基准☆145Updated last year
- 📦 快速转化「中文数字」和「阿拉伯数字」~ (最新特性:分数,日期、温度等 转化)☆733Updated 6 months ago
- Chinese Couplets Dataset without vulgar words. 不包含敏感内容的对联数据集。☆75Updated 5 years ago
- A small package to fuzzy match chinese words☆88Updated 2 years ago
- 利用文本分析算法和Python脚本,自动纠正word中的英语单词拼写错误☆47Updated 6 years ago
- Simple conversion and localization between simplified and traditional Chinese using tables from MediaWiki.☆542Updated last year
- 中文标点符号模型,可以给文本添加标点符号。☆142Updated 6 months ago
- MiniRBT (中文小型预训练模型系列)☆285Updated 2 years ago
- 大规模中文语料☆42Updated 5 years ago
- 中文纠错☆92Updated 3 years ago
- 古文现代文翻译平行语料库☆108Updated 3 years ago
- PERT: Pre-training BERT with Permuted Language Model☆363Updated this week
- 中文、分词、词表、核心词典、事件词表、停用词、敏感词、问答、问答数据、知识图谱、文本语料。☆165Updated 3 years ago
- 时间抽取、解析、标准化工具☆53Updated 2 years ago
- 汉字数据集,包括汉字的相关信息,例如笔画数、部首、拼音、英文释义/同义词等。☆120Updated 5 years ago
- Chinese MobileBERT(中文MobileBERT模型)☆94Updated 3 years ago
- Self complemented Pinyin2Chinese demo use algorithms including Trie and HMM model , 基于隐马尔科夫模型与Trie树的拼音切分与拼音转中文的简单demo实现。☆86Updated 7 years ago
- ☆21Updated 3 years ago
- 对常用的6700个汉字进行音、形比较,输出音近字、形近字的列表。 # 相近字☆461Updated last year
- Mimix: A Text Generation Tool and Pretrained Chinese Models☆155Updated 8 months ago
- 词语拼音数据☆489Updated 3 months ago
- Estimate the phonetic distance between Chinese words and get similar sounding candidate words.☆37Updated 2 months ago
- 基于bert进行中文文本纠错☆235Updated 2 years ago
- ChatGLM-6B-Slim:裁减掉20K图片Token的ChatGLM-6B,完全一样的性能,占用更小的显存。☆126Updated 2 years ago