Brokenice0415 / Old_Book_OCRLinks
古籍识别
☆15Updated 4 years ago
Alternatives and similar repositories for Old_Book_OCR
Users that are interested in Old_Book_OCR are comparing it to the libraries listed below
Sorting:
- GuwenModels: 古文自然语言处理模型合集, 收录互联网上的古文相关模型及资源. A collection of Classical Chinese natural language processing models, including Classical Ch…☆196Updated 2 years ago
- <数字人文教程>资源合集☆111Updated last year
- 渊 - A project for Classical Chinese☆110Updated 3 years ago
- ☆39Updated 2 years ago
- 比较全的中华古诗古词古文库,包括21万首古诗词,以及注释、赏析等信息,包含10000多名诗人以及诗人的介绍、生平等,同时包含,1600多个词牌介绍,中国70多个朝代解析,和古诗文的近200个分类标签☆392Updated 2 years ago
- 甲言,专注于古代汉语(古汉语/古文/文言文/文言)处理的NLP工具包,支持文言词库构建、分词、词性标注、断句和标点。Jiayan, the 1st NLP toolkit designed for Classical Chinese, supports lexicon co…☆640Updated 4 years ago
- A cute toolkit for OCR with GUI, including image preprocessing and text recognition. Works out of the box. 一只小小的OCR工具箱,包括图像预处理和文字识别等功能,…☆18Updated 2 months ago
- 中国诗词歌赋数据库 总计82万余首(827108) CSV 格式 简体中文 按照number有序☆64Updated 11 months ago
- GuwenBERT: 古文预训练语言模型(古文BERT) A Pre-trained Language Model for Classical Chinese (Literary Chinese)☆546Updated 4 years ago
- 爬取自互联网的古诗词语料库,包含先秦至当代诗词,共计1014508首诗☆44Updated 3 years ago
- 汉语古典文本资料库☆315Updated 7 years ago
- SikuBERT:四库全书的预训练语言模型(四库BERT) Pre-training Model of Siku Quanshu☆150Updated 2 years ago
- 【间隙·树·排序算法】 对OCR结果或PDF提取的文本进行版面分析,按人类阅读顺序进行排序。☆163Updated last year
- ☆405Updated 5 months ago
- Mimix: A Text Generation Tool and Pretrained Chinese Models☆157Updated last year
- "桃李“: 国际中文教育大模型☆188Updated 2 years ago
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆292Updated 4 months ago
- 近代汉语语料库数据集 自然语言处理 语料库 古代汉语 古汉语 文言文 数字人文 计算语言☆167Updated 10 months ago
- [EMNLP 2024] TongGu, a classical Chinese language model.☆56Updated last year
- The hanzi similar tool.(汉字相似度计算工具,中文形近字算法。可用于手写汉字识别纠正,文本混淆等。)☆284Updated last year
- A carefully-designed OCR pipeline for universal boarded table recognition and reconstruction.☆178Updated 2 years ago
- Easy-to-use CPM for Chinese text generation(基于CPM的中文文本生成)☆532Updated 2 years ago
- 基于序列表格识别算法推理库,集成PP-Structure和modelscope等表格识别算法。☆402Updated 4 months ago
- The most complete Chinese dictionaries ever. 史上最全的中文分类词库,包含地理信息、电子游戏、工程应用、农林牧渔、人文科学、社会科学、生活百科、医学医药、艺术设计、娱乐休闲、运动休闲、自然科学等12大类的超级字典。☆85Updated 5 years ago
- 基于transformer的ocr识别,在公章(印章识别, seal recognition)拓展应用☆276Updated 2 months ago
- This is a pre-trained LSTM model. This model can help you to segment unpunctuated historical Chinese texts. 這是基於 LSTM 的預訓練模型。此模型可幫助您為漢語古文…☆28Updated 4 years ago
- 文档方向分类☆224Updated last year
- 该项目主要是为了识别图片里面的表格数据,并将表格数据抽取处理,导出成csv的文件。整个项目会使用streamlit进行部署和展示。使用的技术:paddleocr,PPStructure,streamlit☆34Updated 3 years ago
- 中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽…☆23Updated 2 years ago
- 中文CLIP预训练模型☆421Updated 3 years ago