explosion / spacy-pkusegLinks
pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation
☆61Updated last week
Alternatives and similar repositories for spacy-pkuseg
Users that are interested in spacy-pkuseg are comparing it to the libraries listed below
Sorting:
- Grapheme-to-Phoneme lexicons for Chinese dialects☆69Updated 2 years ago
- A large high-quality corpus of Chinese synonyms 一个大型、高质量的中文同义词语料库。☆55Updated 3 years ago
- A convenient Chinese word segmentation tool 简便中文分词器☆46Updated 3 weeks ago
- 基于 g2pW 提升 pypinyin 的准确性☆90Updated last year
- 中文标点符号模型,可以给文本添加标点符号。☆141Updated 5 months ago
- 各大中文分词性能评测☆157Updated 6 years ago
- SuperCLUE琅琊榜:中文通用大模型匿名对战评价基准☆144Updated 11 months ago
- CINO: Pre-trained Language Models for Chinese Minority (少数民族语言预训练模型)☆245Updated 2 years ago
- 中文文本改写☆19Updated 4 years ago
- 渊 - A project for Classical Chinese☆104Updated 3 years ago
- 本项目旨在对大量文本文件进行快速编码检测和转换以辅助mnbvc语料集项目的数据清洗工作☆61Updated 7 months ago
- llama.cpp with unicode (windows) support☆53Updated 2 years ago
- 基于bert进行中文文本纠错☆235Updated last year
- 供AI训练的中文数据集(持续更新。。。)与AI公司图谱,目前的数据集餐饮行业8000问,百度知道,Alpaca中文数据集,计算机领域数据集,Vicuna数据集,RedPajama数据集,Wikipedia中文词条数据集,网站论坛问答数据集☆57Updated last year
- ChatGLM-6B fine-tuning.☆135Updated 2 years ago
- 时间抽取、解析、标准化工具☆52Updated 2 years ago
- 中文纠错☆92Updated 3 years ago
- 🌈 NERpy: Implementation of Named Entity Recognition using Python. 命名实体识别工具,支持BertSoftmax、BertSpan等模型,开箱即用。☆113Updated last year
- Hong Kong Cantonese Corpus of transcribed speech (spontaneous speech, radio programmes and a monologue).☆62Updated last year
- PERT: Pre-training BERT with Permuted Language Model☆361Updated 2 years ago
- MiniRBT (中文小型预训练模型系列)☆280Updated 2 years ago
- clueai工具包: 3行代码3分钟,自定义需要的API!☆233Updated 2 years ago
- 首个llama2 13b 中文版模型 (Base + 中文对话SFT,实现流畅多轮人机自然语言交互)☆90Updated last year
- 古文现代文翻译平行语料库☆105Updated 3 years ago
- Chinese Couplets Dataset without vulgar words. 不包含敏感内容的对联数据集。☆73Updated 5 years ago
- ChatGLM-6B-Slim:裁减掉20K图片Token的ChatGLM-6B,完全一样的性能,占用更小的显存。☆126Updated 2 years ago
- ✨ Split text by languages (e.g. 你喜欢看アニメ吗 -> 你喜欢看 | アニメ | 吗) for NLP tasks (e.g. parse, TTS). Powered by fasttext and budoux☆59Updated 3 months ago
- 大规模中文语料☆42Updated 5 years ago
- 使用 pinyin-data 和 phrase-pinyin-data 中的拼音数据文件覆盖 pypinyin 中的内置拼音数据☆59Updated 4 months ago
- ☆31Updated 2 years ago