ZN1010 / PEaCE
[LREC-COLING 2024] PEaCE: A Chemistry-Oriented Dataset for Optical Character Recognition on Scientific Documents. Boost OCR Performance on Scientific Documents.
☆12Updated 7 months ago
Alternatives and similar repositories for PEaCE:
Users that are interested in PEaCE are comparing it to the libraries listed below
- 360LayoutAnaylsis, a series Document Analysis Models and Datasets deleveped by 360 AI Research Institute☆259Updated 4 months ago
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆192Updated last month
- UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition☆254Updated 3 weeks ago
- [ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"☆213Updated last month
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆135Updated 7 months ago
- ☆110Updated 11 months ago
- 【ArXiv】PDF-Wukong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling☆108Updated 3 months ago
- A Unified Toolkit for Deep Learning-Based Table Extraction☆28Updated last month
- A Curated List of Awesome Table Structure Recognition (TSR) Research. Including models, papers, datasets and codes. Continuously updating…☆157Updated 4 months ago
- 基于序列表格识别算法推理库,集成PP-Structure和modelscope等表格识别算法。☆170Updated last week
- 文档方向分类☆207Updated 2 months ago
- Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)☆76Updated 4 months ago
- TianGong-AI-Unstructure☆56Updated 2 weeks ago
- Analysis of Chinese and English layouts 中英文版面分析☆156Updated 3 weeks ago
- 检测和提取各种场景图片中的表格区域,并纠正透视和旋转问题 Detect and extract table regions from images in various scenarios, and correct perspective and rotation i…☆54Updated last month
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆157Updated 7 months ago
- Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset☆25Updated last year
- 通过浏览器渲染生成表格图像☆211Updated 9 months ago
- Datasets and Evaluation Scripts for CompHRDoc☆31Updated 9 months ago
- ☆31Updated last month
- Repository for training LLaMa 2 models using the NERRE format.☆52Updated last year
- Document Artifical Intelligence☆138Updated last month
- Code and data for the publication "Structured information extraction from scientific text with large language models" by Dagdelen & Dunn …☆87Updated last year
- 该项目是为了使用layoutlmv3针对中文图片训练和推理。 其中主要解决三个问题: 1.数据标准化成可以的训练数据集格式 2.layoutlmv3-base-chinese 分词修改 2.超过512 长度的文本切分和滑窗操作☆39Updated 4 months ago
- 阅读顺序、Layoutreader☆11Updated 7 months ago
- ICDAR 2024 Table OCR Model☆27Updated last month
- MTL-TabNet: Multi-task Learning based Model for Image-based Table Recognition☆90Updated 7 months ago
- Repo for for paper "AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation Extraction".☆58Updated 5 months ago
- CDLA: A Chinese document layout analysis (CDLA) dataset☆254Updated 3 years ago
- ☆230Updated last year