explosion / spacy-pkusegLinks
pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation
☆62Updated last month
Alternatives and similar repositories for spacy-pkuseg
Users that are interested in spacy-pkuseg are comparing it to the libraries listed below
Sorting:
- A large high-quality corpus of Chinese synonyms 一个大型、高质量的中文同义词语料库。☆61Updated 3 years ago
- A convenient Chinese word segmentation tool 简便中文分词器☆48Updated 3 months ago
- 📦 快速转化「中文数字」和「阿拉伯数字」~ (最新特性:分数,日期、温度等转 化)☆738Updated 8 months ago
- 中文标点符号模型,可以给文本添加标点符号。☆143Updated 8 months ago
- 渊 - A project for Classical Chinese☆106Updated 3 years ago
- CINO: Pre-trained Language Models for Chinese Minority (少数民族语言预训练模型)☆252Updated last month
- 各大中文分词性能评测☆158Updated 6 years ago
- PERT: Pre-training BERT with Permuted Language Model☆365Updated last month
- 最好的汉字数字(中文数字)-阿拉伯数字转换工具。包含"点二八","负百分之四十"等众多汉语表达方法。NLP,机器人工程必备! The Best Tool of Chinese Number to Digits☆372Updated 2 years ago
- Simple conversion and localization between simplified and traditional Chinese using tables from MediaWiki.☆545Updated last year
- Chinese Couplets Dataset without vulgar words. 不包含敏感内容的对联数据集。☆76Updated 5 years ago
- SuperCLUE琅琊榜:中文通用大模型匿名对战评价基准☆145Updated last year
- ✨ Split text by languages (e.g. 你喜欢看アニメ吗 -> 你喜欢看 | アニメ | 吗) for NLP tasks (e.g. parse, TTS). Powered by fasttext and budoux☆62Updated 6 months ago
- Hong Kong Cantonese Corpus of transcribed speech (spontaneous speech, radio programmes and a monologue).☆67Updated last year
- 基于sentence-transformers实现文本转向量的机器人☆47Updated 3 years ago
- MiniRBT (中文小型预训练模型系列)☆291Updated last month
- 使用 pinyin-data 和 phrase-pinyin-data 中的拼音数据文件覆盖 pypinyin 中的内置拼音数据☆63Updated 7 months ago
- ☆126Updated 4 years ago
- 利用文本分析算法和Python脚本,自动纠正word中的英语单词拼写错误☆47Updated 7 years ago
- Dynamic Voice Actor Assignment and Emotional Narration for Realistic Story Play☆39Updated 4 months ago
- clueai工具包: 3行代码3分钟,自定义需要的API!☆232Updated 2 years ago
- 近代汉语语料库数据集 自然语言处理 语料库 古代汉语 古汉语 文言文 数字人文 计算语言☆165Updated 5 months ago
- 中文文本相似度计算器☆158Updated 11 months ago
- Constants used in Chinese text processing☆377Updated 8 months ago
- 粤语分词工具☆48Updated 7 years ago
- ChatGLM-6B-Slim:裁减掉20K图片Token的ChatGLM-6B,完全一样的性能,占用更小的显存。☆127Updated 2 years ago
- 中文纠错☆93Updated 3 years ago
- 基于 g2pW 提升 pypinyin 的准确性☆100Updated 2 years ago
- Hanzi Converter for Traditional and Simplified Chinese☆189Updated 5 years ago
- 中文停用词/常用汉字/生僻字集合☆172Updated 6 years ago