selfcs / stop-and-sensitive-words
停用词和敏感词库
☆14Updated 3 years ago
Related projects: ⓘ
- 百度QA100万数据集☆48Updated 9 months ago
- Tracking the hot Github repos and update daily 每天自动追踪Github热门项目☆39Updated this week
- 中文文本改写☆19Updated 3 years ago
- 公安网备 敏感词过滤词☆13Updated 5 years ago
- 基于simhash的文本去重算法☆18Updated 3 years ago
- 中文新词发现算法PNW算法,可以识别任意长度的新词。☆15Updated last year
- ☆20Updated 2 years ago
- ☆16Updated 9 months ago
- 仓库主要记录 NLP 算法工程师相关的顶会论文研读笔记【文本匹配篇】☆11Updated 2 years ago
- Large-scale exact string matching tool☆15Updated 11 months ago
- ☆12Updated this week
- 基于自回归模型与现有的开源大模型,训练小说大模型☆23Updated 11 months ago
- aigc evals☆10Updated 9 months ago
- 针对保险话术培训场景设计的陪练机器人/培训机器人的demo☆17Updated 3 years ago
- 百度百科 500 万数据集☆29Updated 9 months ago
- ☆97Updated this week
- 中文PDF转TXT的实用 工具☆30Updated 2 years ago
- GOAT(山羊)是中英文大语言模型,基于LlaMa进行SFT。☆12Updated last year
- 汉字五笔转换工具☆31Updated 5 years ago
- TensorRT☆11Updated 4 years ago
- 用于微调LLM的中文指令数据集☆27Updated last year
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆70Updated last year
- 用于生成文本纠错模型(如Gector)需要的大量数据。☆14Updated last year
- 有一个通用实体关系事件抽取的任务,需要使用到UIE模框架,而且需要将起部署到昇腾310服务器上,因为UIE模型底层使用的是ernie3.0,但是目前paddle官方还不支持ernie3.0模型在昇腾310上部署,所以才有了以下的操作,主要过程是,先试用paddle训练处模型…☆16Updated 2 years ago
- CCKS 2022 通用信息抽取☆12Updated 2 years ago
- 智能营销文案生成☆34Updated 3 years ago
- rasa_chinese 的服务 package☆17Updated 3 years ago
- GoGPT中文指令数据集构造☆10Updated 7 months ago
- GLM (General Language Model)☆24Updated 2 years ago
- Perform crosstalk with Qian Yu☆42Updated last year