kmrambo / Python-docx-Reading-paragraphs-tables-and-images-in-document-order-Links

The Python docx package cannot read paragraphs, tables and images in document order. It can only render all the paragraphs at once or all tables at once or all images at once. Here, I provide a way in which paragraphs, tables and images present in a docx file can be read in document order into a dataframe in python.

☆80

Alternatives and similar repositories for Python-docx-Reading-paragraphs-tables-and-images-in-document-order-

Users that are interested in Python-docx-Reading-paragraphs-tables-and-images-in-document-order- are comparing it to the libraries listed below

Sorting:

ShayHill / docx2python
Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.
☆186Updated last week
BordiaS / layoutlm
☆95Updated 5 years ago
ChrizH / pdfstructure
`pdfstructure` detects, splits and organizes the documents text content into its natural structure as envisioned by the author.
☆106Updated last year
doc-analysis / ReadingBank
ReadingBank: A Benchmark Dataset for Reading Order Detection
☆107Updated 11 months ago
BayooG / bayoo-docx
Create and modify Word documents with Python
☆147Updated last year
virtualsociety / ai-table-recognition
☆40Updated 4 years ago
DS4SD / DocLayNet
DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis
☆360Updated 2 years ago
Spico197 / CatalogExtraction
🌳CED: Catalog Extraction from Documents
☆16Updated 2 years ago
ppaanngggg / layoutreader
A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.
☆265Updated last month
ismail-mebsout / Parsing-PDFs-using-YOLOV3
Parsing pdf tables using YOLOV3
☆118Updated 4 years ago
Toon-nooT / notebooks
☆17Updated last year
DS3Lab / DocParser
☆81Updated 3 years ago
poteminr / instruct-ner
Instruct LLMs for flat and nested NER. Fine-tuning Llama and Mistral models for instruction named entity recognition. (Instruction NER)
☆84Updated last year
syw1996 / TableGPT
☆99Updated 4 years ago
dogatana / docx2md
☆39Updated 3 months ago
pymupdf / PyMuPDF-Utilities
Demos, examples and utilities using PyMuPDF
☆672Updated last year
jstockwin / py-pdf-parser
A Python tool to help extracting information from structured PDFs.
☆408Updated last week
Sanster / xy-cut
☆87Updated 3 years ago
UBIAI / layoutlmv3FineTuning
☆34Updated 3 years ago
GeorgeLuImmortal / DocLLM_reimplementation
☆22Updated last year
CaseDrive / publaynet-models
Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset
☆28Updated 2 years ago
LynnHaDo / Document-Layout-Analysis
Object Detection Model for Scanned Documents
☆94Updated 4 months ago
Psarpei / Multi-Type-TD-TSR
Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition
☆280Updated 2 years ago
doc-analysis / XFUND
XFUND: A Multilingual Form Understanding Benchmark
☆207Updated 3 years ago
Alpha-Innovator / StructEqTable-Deploy
A High-efficiency Open-source Toolkit for Table-to-Latex Task
☆254Updated 7 months ago
doc-analysis / DocBank
DocBank: A Benchmark Dataset for Document Layout Analysis
☆622Updated 11 months ago
svjack / docvqa-gen
Question Answering dataset generator of Document Visual in English and Chinese
☆24Updated 2 years ago
adobe / pdfservices-python-sdk-samples
Adobe PDFServices python SDK Samples
☆154Updated 2 weeks ago
manikanthp / LayoutLMV3_Fine_Tuning
☆62Updated last year
TebooNok / HiQA
Code implement reposity of Paper HiQA
☆101Updated 5 months ago