dfop02 / html4docxLinks
Convert html to docx
☆31Updated last week
Alternatives and similar repositories for html4docx
Users that are interested in html4docx are comparing it to the libraries listed below
Sorting:
- Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.☆183Updated last week
- Convert html to docx☆81Updated 11 months ago
- A python library to make filling pdfs much easier☆151Updated 10 months ago
- Python API for PDF documents☆122Updated 9 months ago
- Simplify DOCX files to JSON☆240Updated 8 months ago
- Logical structure analysis for visually structured documents☆90Updated 2 years ago
- A python library to define and validate data types in Docling.☆147Updated this week
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆105Updated 9 months ago
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆393Updated 10 months ago
- Append/Concatenate .docx documents☆114Updated 10 months ago
- An extendable docx file format parser and converter☆192Updated last month
- Working with hOCR in Javascript☆129Updated 2 years ago
- Pipeline development framework, easy to experiment and compare different pipelines, quick to deploy to workflow orchestration tools☆17Updated last year
- ☆125Updated this week
- `pdfstructure` detects, splits and organizes the documents text content into its natural structure as envisioned by the author.☆104Updated last year
- A python based HTML to text conversion library, command line client and Web service.☆311Updated 3 weeks ago
- CRUD Word documents with Python☆11Updated 6 months ago
- Conversions between various OCR formats☆78Updated 2 years ago
- A curated list of resources around PDF files☆134Updated 10 months ago
- DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis☆349Updated 2 years ago
- The hOCR Embedded OCR Workflow and Output Format☆73Updated 10 months ago
- Keyword spaCy is a spaCy pipeline component for extracting keywords from text using cosine similarity.☆11Updated last year
- Parallel and LAzY Analyzer for PDFs 🏖️☆31Updated this week
- OCR & Ground Truth Resources☆75Updated 3 years ago
- Python bindings to PDFium☆585Updated last week
- A Python asyncio wrapper for Tesseract-OCR.☆26Updated 7 months ago
- PDF to XML ALTO file converter☆242Updated 2 weeks ago
- A dataset of region-annotated scientific articles.☆21Updated 5 years ago
- ☆40Updated 4 years ago
- ☆743Updated 2 months ago