explosion / spacy-pkuseg
pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation
☆51Updated 2 weeks ago
Related projects: ⓘ
- 渊 - A project for Classical Chinese☆88Updated 2 years ago
- CINO: Pre-trained Language Models for Chinese Minority (少数民族语言预训练模型)☆204Updated last year
- A large high-quality corpus of Chinese synonyms 一个大型、高质量的中文同义词语料库。☆36Updated 2 years ago
- 各大中文分词性能评测☆151Updated 5 years ago
- 中文标点符号模型,可以给文本添加标点符号。☆128Updated 6 months ago
- 时间抽取、解析、标准化工具☆48Updated last year
- Chinese Couplets Dataset without vulgar words. 不包含敏感内容的对联数据集。☆67Updated 4 years ago
- MiniRBT (中文小型预训练模型系列)☆244Updated last year
- LERT: A Linguistically-motivated Pre-trained Language Model(语言学信息增强的预训练模型LERT)☆195Updated last year
- ☆26Updated this week
- 中文文本改写☆19Updated 3 years ago
- 一个基于预训练的句向量生成工具☆131Updated last year
- SuperCLUE琅琊榜:中文通用大模型匿名对战评价基准☆139Updated 3 months ago
- A convenient Chinese word segmentation tool 简便中文分词器☆45Updated last month
- ☆37Updated 5 months ago
- 古文现代文翻译平行语料库☆95Updated 2 years ago
- 基于bert进行中文文本纠错☆225Updated last year
- Pytorch model for https://github.com/imcaspar/gpt2-ml☆79Updated 2 years ago
- Grapheme-to-Phoneme lexicons for Chinese dialects☆67Updated last year
- 本项目旨在对大量文本文件进行快速编码检测和转换以辅助mnbvc语料集项目的数据清洗工作☆52Updated 3 weeks ago
- 古文语言理解测评基准 Classical Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard☆45Updated last year
- PERT: Pre-training BERT with Permuted Language Model☆350Updated last year
- ☆120Updated 3 years ago
- Llama2开源模型中文版-全方位测评,基于SuperCLUE的OPEN基准 | Llama2 Chinese evaluation with SuperCLUE☆127Updated last year
- 🌈 NERpy: Implementation of Named Entity Recognition using Python. 命名实体识别工具,支持BertSoftmax、BertSpan等模型,开箱即用。☆111Updated 7 months ago
- 基于sentence-transformers实现文本转向量的机器人☆45Updated 2 years ago
- rasa_chinese 专门针对中文语言的 rasa 组件扩展包,提供了许多针对中文语言的组件☆141Updated last year
- 高性能文本 Tokenizer 库☆23Updated 7 months ago
- Rasa通过PaddleNLP提供中文支持☆32Updated 2 years ago
- 机器学习训练简单模型判定一个句子是不是疑问句☆15Updated 2 years ago