Amourspirit / python_ooo_dev_tools
☆23Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for python_ooo_dev_tools
- ☆11Updated 4 months ago
- LibreOffice Programming☆8Updated last year
- Logical structure analysis for visually structured documents☆84Updated 2 years ago
- Python bindings to PDFium☆427Updated 3 weeks ago
- OCRmyPDF EasyOCR plugin☆53Updated 2 months ago
- The hOCR Embedded OCR Workflow and Output Format☆74Updated 3 months ago
- Python binding to Poppler-cpp pdf library☆98Updated 2 months ago
- Fast PDF generation and compression. Deals with millions of pages daily.☆102Updated 3 months ago
- This repository contains code for line detection, character detection and recognition on the cuneiform 2d images☆30Updated 5 years ago
- Master repository which includes most other OCR-D repositories as submodules☆72Updated last month
- A curated list of resources around PDF files☆108Updated 3 months ago
- Industry supported, open source PDF/A validation library☆278Updated this week
- Python API for PDF documents☆117Updated 2 months ago
- Run OCR, extract information from documents and classify them. In addition, annotate documents and build custom NLP and computer vision m…☆62Updated last week
- Extracting Semi-Structured Data from PDFs on a large scale☆51Updated 2 years ago
- Scripts and results from our OCR roundup, available on Source☆150Updated 5 years ago
- An index of PDF-centric corpora☆113Updated last month
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆370Updated 3 months ago
- A HarfBuzz Python binding☆68Updated 2 weeks ago
- Conversions between various OCR formats☆71Updated last year
- RUPS is an acronym for Reading and Updating PDF Syntax. RUPS is a tool built on top of iText® that allows you to look inside a PDF docume…☆288Updated last month
- ☆585Updated 3 weeks ago
- CSS Paged Media tutorial and review of tools (repository for print-css.rocks)☆172Updated 3 weeks ago
- Industry-based resolutions for issues and errata reported against any PDF-related specification☆66Updated this week
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆62Updated 8 months ago
- PAGE XML format collection for document image page content and more☆66Updated 3 years ago
- `pdfstructure` detects, splits and organizes the documents text content into its natural structure as envisioned by the author.☆101Updated 7 months ago
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆180Updated last month
- MathML Cloud API☆28Updated 3 years ago
- Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an ou…☆166Updated this week