ShayHill / docx2python
Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.
☆177Updated this week
Alternatives and similar repositories for docx2python:
Users that are interested in docx2python are comparing it to the libraries listed below
- A Python tool to help extracting information from structured PDFs.☆395Updated last week
- Simplify DOCX files to JSON☆225Updated 5 months ago
- Streamlit PDF viewer☆132Updated last month
- Logical structure analysis for visually structured documents☆86Updated 2 years ago
- Benchmarking PDF libraries☆263Updated last year
- 80x faster and 95% accurate language identification with Fasttext☆148Updated last year
- Create and modify Word documents with Python☆143Updated 8 months ago
- The Python docx package cannot read paragraphs, tables and images in document order. It can only render all the paragraphs at once or all…☆77Updated last year
- Adobe PDFServices python SDK Samples☆144Updated 4 months ago
- Python client for GROBID Web services☆314Updated last week
- Viewer for the structure extracted by Grobid on PDF documents☆46Updated last month
- Demos, examples and utilities using PyMuPDF☆631Updated 8 months ago
- Python API for PDF documents☆118Updated 6 months ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆259Updated this week
- A simple component to display annotated text in Streamlit apps.☆540Updated last month
- Python bindings to PDFium☆542Updated this week
- ⚡️ 80x faster Fasttext language detection out of the box | Split text by language☆172Updated last week
- A python library for extracting text from PDFs without losing the formatting of the PDF content.☆77Updated 3 years ago
- A python based HTML to text conversion library, command line client and Web service.☆295Updated last month
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆59Updated 10 months ago
- A Python library to extract tabular data from PDFs☆67Updated this week
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆175Updated last year
- multimodal document analysis☆163Updated 9 months ago
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆103Updated 6 months ago
- ☆27Updated last year
- Interact with the Deep Search platform for new knowledge explorations and discoveries☆175Updated last month
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆135Updated 2 months ago
- `pdfstructure` detects, splits and organizes the documents text content into its natural structure as envisioned by the author.☆103Updated 11 months ago
- Python binding to Poppler-cpp pdf library☆106Updated 6 months ago
- The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity☆110Updated last week