shibing624 / pinyin-tokenizer
pinyintokenizer, 拼音分词器,将连续的拼音切分为单字拼音列表。
☆30Updated 3 months ago
Alternatives and similar repositories for pinyin-tokenizer
Users that are interested in pinyin-tokenizer are comparing it to the libraries listed below
Sorting:
- 一个微博毒舌AI,疯狂 diss 微博博主☆12Updated 4 months ago
- Python3 package for Chinese/English OCR, with paddleocr-v4 onnx model(~14MB). 基于ppocr-v4-onnx模型推理,可实现 CPU 上毫秒级的 OCR 精准预测,通用场景中英文OCR达到开源SO…☆81Updated 3 months ago
- 百度QA100万数据集☆47Updated last year
- A Python Package to Access World-Class Generative Models☆127Updated 11 months ago
- Llama2开源模型中文版-全方位测评,基于SuperCLUE的OPEN基准 | Llama2 Chinese evaluation with SuperCLUE☆126Updated last year
- ☆27Updated 7 months ago
- 大语言模型训练和服务调研☆37Updated last year
- Tracking the hot Github repos and update daily 每天自动追踪Github热门项目☆49Updated this week
- 通用版面分析 | 中文文档解析 |Document Layout Analysis | layout paser☆46Updated 11 months ago
- 如需体验textin文档解析,请点击https://cc.co/16YSIy☆22Updated 10 months ago
- A demo built on Megrez-3B-Instruct, integrating a web search tool to enhance the model's question-and-answer capabilities.☆38Updated 5 months ago
- 一个基于预训练的句向量生成工具☆137Updated 2 years ago
- 基于Qwen2模型进行通用信息抽取【实体/关系/事件抽取】☆31Updated 10 months ago
- share data, prompt data , pretraining data☆36Updated last year
- 百度百科 500 万数据集☆34Updated last year
- Silk Road will be the dataset zoo for Luotuo(骆驼). Luotuo is an open sourced Chinese-LLM project founded by 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技 & 冷子…☆39Updated last year
- 利用LLM+敏感词库,来自动判别是否涉及敏感词。☆122Updated last year
- llama inference for tencentpretrain☆98Updated last year
- 中文纠错☆92Updated 3 years ago
- 本项目旨在对大量文本文件进行快速编码检测和转换以辅助mnbvc语料集项目的数据清洗工作☆61Updated 6 months ago
- 基于 LoRA 和 P-Tuning v2 的 ChatGLM-6B 高效参数微调☆55Updated 2 years ago
- 时间抽取、解析、标准化工具☆52Updated 2 years ago
- 首个llama2 13b 中文版模型 (Base + 中文对话SFT,实现流畅多轮人机自然语言交互)☆90Updated last year
- LLama3中文个人版本☆39Updated last year
- 演示 vllm 对中文大语言模型的神奇效果☆31Updated last year
- 骆驼QA,中文大语言阅读理解模型。☆74Updated last year
- bge推理优化相关脚本☆28Updated last year
- funasr语音转文字的简单api版本,funasr+fastapi,方便部署在服务器上☆10Updated 9 months ago
- 中文、分词、词表、核心词典、事件词表、停用词、敏感词、问答、问答数据、知识图谱、文本语料。☆162Updated 3 years ago
- RelExt: A Tool for Relation Extraction from Text. 文本实体关系抽取工具。☆50Updated 2 years ago