yukiyuqichen / OCR-Toolkit
A cute toolkit for OCR with GUI, including image preprocessing and text recognition. Works out of the box. 一只小小的OCR工具箱,包括图像预处理和文字识别等功能,开箱即用。
☆11Updated 9 months ago
Related projects: ⓘ
- Chinese character variant converter. 中文异体字转换器。☆14Updated 3 weeks ago
- 【间隙·树·排序算法】 对OCR结果或PDF提取的文本进行版面分析,按人类阅读顺序进行排序。☆91Updated 6 months ago
- Based on RapidOCR, extract the PDF content.☆126Updated 3 weeks ago
- GuwenModels: 古文自然语言处理模型合集, 收录互联网上的古文相关模型及资源. A collection of Classical Chinese natural language processing models, including Classical Ch…☆146Updated 9 months ago
- 图书名语料库。含部分电影、游戏名称。☆66Updated 5 months ago
- <数字人文教程>资源合集☆73Updated 3 months ago
- Recognize tables and text from scanned images that contain tables. 从包含表格的扫描图片中识别表格和文字☆248Updated last year
- This is a pre-trained LSTM model. This model can help you to segment unpunctuated historical Chinese texts. 這是基於 LSTM 的預訓練模型。此模型可幫助您為漢語古文…☆21Updated 2 years ago
- 搜狗细胞词库到普通文本的转换提取工具。提取词汇表,用于深度学习做数据生成和字典特征☆22Updated 5 years ago
- A carefully-designed OCR pipeline for universal boarded table recognition and reconstruction.☆168Updated last year
- ☆38Updated 5 years ago
- ☆238Updated last month
- 医疗语料库。医疗机构名语料库。药品本位码。☆58Updated 5 months ago
- 菜谱名语料库。☆13Updated 3 years ago
- Analysis of Chinese and English layouts 中英文版面分析☆94Updated 2 months ago
- 物种名称语料库。植物名,动物名。☆40Updated 5 months ago
- 一个相对完整的文档分析和识别项目☆143Updated 4 years ago
- pinyintokenizer, 拼音分词器,将连续的拼音切分为单字拼音列表。☆26Updated 8 months ago
- 天池比赛作品整理。实现从pdf中提取出姓名、出生年月、性别、电话、最高学历、籍贯、落户市县、政治面貌、毕业院校、工作单位、工作内容、职务、项目名称、项目责任、学位、毕业时间、工作时间、项目时间共18个字段。☆107Updated last month
- 中文、分词、词表、核心词典、事件词表、停用词、敏感词、问答、问答数据、知识图谱、文本语料。☆144Updated 2 years ago
- doc2x docs☆29Updated 2 months ago
- 字词:收集国学/汉语字词拼音相关资源☆27Updated 6 years ago
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆76Updated 3 months ago
- PDF解析(文字,章节,表格,图片,参考),基于大模型(ChatGLM2-6B, RWKV)+langchain+streamlit的PDF问答,摘要,信息抽取☆144Updated 11 months ago
- 中文图书数据集/数据挖掘/自然语言处理/中国图书分类法/图书情报学/数据挖掘/文本分类/☆80Updated last year
- 中文纠错☆89Updated 2 years ago
- 百度汉语字典爬虫,拼音数据,35万海量百度词典数据。☆21Updated 2 years ago
- Chinese Mathematical Formula Detection (MFD) Dataset 中文文档数学公式检测数据集☆28Updated last year
- 打造人人都会的NLP,开源不易,记得star哦☆101Updated last year
- 通过paddle ocr实现pdf转markdown☆49Updated 3 months ago