throput / pinyinsplitLinks
A Python library to split a Chinese Pinyin phrase into possible permutations of Chinese Pinyin words
☆12Updated 3 years ago
Alternatives and similar repositories for pinyinsplit
Users that are interested in pinyinsplit are comparing it to the libraries listed below
Sorting:
- 一个简单易用的 Python 模块,用于通过字符串来操作日期/时间。正则时间提取,字符串时间解析,字符串时间提取。中文时间提取,一句话里面提取时间☆75Updated 11 months ago
- Probabilistically split concatenated words using NLP based on English Wikipedia unigram frequencies.☆50Updated 6 years ago
- 时间抽取、解析、标准化工具☆52Updated 2 years ago
- 易盾反垃圾python演示☆29Updated 10 months ago
- 时间关键词正则提取以及标准化☆21Updated 3 years ago
- Time-NLP的Python3版本 中文时间表达识别☆89Updated 5 years ago
- AC自动机python的实现,并进行了优化。 主要修复了 查询不准确的问题。☆73Updated 4 years ago
- 通用版面分析 | 中文文档解析 |Document Layout Analysis | layout paser☆46Updated 11 months ago
- [SIGIR 2022] Multi-CPR: A Multi Domain Chinese Dataset for Passage Retrieval☆185Updated 2 years ago
- 中文图书语料MD5链接☆218Updated last year
- 将中文时间表达词转为相应的时间字符串,支持时间点,时间段,时间间隔。☆21Updated 3 years ago
- Rasa通过PaddleNLP提供中文支持☆33Updated 3 years ago
- bge推理优化相关脚本☆28Updated last year
- Enhanced version of original AutoGPTQ (https://github.com/PanQiWei/AutoGPTQ).☆10Updated last year
- 基于行块分布函数的通用网页正文抽取算法优化,Python实现☆60Updated 5 years ago
- 中文 Instruction tuning datasets☆131Updated last year
- 基于mlm方式的带有纠错功能的拼音转汉字bert预训练模型,pinyin correcter,基于pytorch框架实现☆45Updated 4 years ago
- 该项目是为了使用layoutlmv3针对中文图片训练和推理。 其中主要解决三个问题: 1.数据标准化成可以的训练数据集格式 2.layoutlmv3-base-chinese 分词修改 2.超过512长度的文本切分和滑窗操作☆48Updated 9 months ago
- 中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽…☆32Updated 6 years ago
- 本项目旨在对大量文本文件进行快速编码检测和转换以辅助mnbvc语料集项目的数据清洗工作☆61Updated 7 months ago
- NLU & NLG (zero-shot) depend on mengzi-t5-base-mt pretrained model☆74Updated 2 years ago
- Code for chinese error detection module, using n-gram and bi-lstm☆135Updated 6 years ago
- ChatGLM2-6B微调, SFT/LoRA, instruction finetune☆108Updated last year
- CDLA: A Chinese document layout analysis (CDLA) dataset☆267Updated 3 years ago
- Easy-to-use and Fast NLP library with awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications.☆11Updated last year
- ☆37Updated 4 years ago
- 大规模中文语料☆42Updated 5 years ago
- use chatGLM to perform text embedding☆45Updated 2 years ago
- deepResearch☆41Updated last month
- pinyintokenizer, 拼音分词器,将连续的拼音切分为单字拼音列表。☆30Updated 4 months ago