ShayHill / docx2python
Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.
☆176Updated this week
Alternatives and similar repositories for docx2python:
Users that are interested in docx2python are comparing it to the libraries listed below
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆243Updated this week
- Adobe PDFServices python SDK Samples☆140Updated 3 months ago
- A Python tool to help extracting information from structured PDFs.☆394Updated this week
- Streamlit PDF viewer☆127Updated 2 weeks ago
- Python bindings to PDFium☆519Updated this week
- ☆173Updated this week
- Logical structure analysis for visually structured documents☆86Updated 2 years ago
- 80x faster and 95% accurate language identification with Fasttext☆146Updated last year
- Python API for PDF documents☆118Updated 5 months ago
- Viewer for the structure extracted by Grobid on PDF documents☆45Updated last week
- 📚 Process PDFs, Word documents and more with spaCy☆411Updated last month
- Simplify DOCX files to JSON☆225Updated 4 months ago
- Create and modify Word documents with Python☆143Updated 8 months ago
- Benchmarking PDF libraries☆254Updated last year
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆59Updated 9 months ago
- Append/Concatenate .docx documents☆106Updated 6 months ago
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆166Updated 8 months ago
- DocLLM: A layout-aware generative language model for multimodal document understanding☆119Updated last year
- A Python library to chunk/group your texts based on semantic similarity.☆93Updated 7 months ago
- A python library for extracting text from PDFs without losing the formatting of the PDF content.☆76Updated 3 years ago
- A Python asyncio wrapper for Tesseract-OCR.☆23Updated 3 months ago
- Extract structured text from pdfs quickly☆413Updated last week
- A simple component to display annotated text in Streamlit apps.☆536Updated 3 weeks ago
- A Python Search Engine for Humans 🥸☆203Updated 9 months ago
- `pdfstructure` detects, splits and organizes the documents text content into its natural structure as envisioned by the author.☆102Updated 10 months ago
- 🦦 weasel: A small and easy workflow system☆75Updated 7 months ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆175Updated last year
- DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis☆315Updated 2 years ago
- ☆207Updated 2 months ago
- Convert Word documents (.docx files) to HTML☆895Updated last month