jsoma / kullLinks
A tool to interactively select text regions of PDFs and images. Mostly for use with PDFQuery or tesseract (UZN/OCR zone files)
☆53Updated 8 years ago
Alternatives and similar repositories for kull
Users that are interested in kull are comparing it to the libraries listed below
Sorting:
- Scripts and results from our OCR roundup, available on Source☆150Updated 6 years ago
- A fast and friendly PDF scraping library.☆783Updated 2 years ago
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆407Updated last year
- A simple viewer and inspection tool for text boxes in PDF documents☆96Updated 3 years ago
- A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.☆460Updated 2 years ago
- Extract tables from PDF pages.☆298Updated 5 years ago
- Extract structured data from PDF invoices☆2,117Updated 2 weeks ago
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆328Updated 2 years ago
- A supermarket receipt parser written in Python using tesseract OCR☆850Updated last year
- Simplify DOCX files to JSON☆256Updated last year
- A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!☆303Updated 8 months ago
- A web interface to extract tabular data from PDFs☆1,787Updated last year
- Convert a PDF via OCR to a TXT file in UTF-8 encoding☆156Updated 2 years ago
- Pure-python library for adding annotations to PDFs☆212Updated 4 years ago
- A general purpose PDF text-layer redaction tool for Python 2/3.☆208Updated last year
- Adapting the python library OCRmyPDF to run in an AWS Lambda Function☆17Updated 3 years ago
- A file conversion microservice written in Node☆34Updated 2 years ago
- Get semantic HTML from PDFs, recover lost text, tables, data... in bulk.☆35Updated last year
- Locate and extract tables and figures in PDFs☆43Updated 4 years ago
- Collection of RPA workflows for TagUI☆74Updated 4 years ago
- Extracting Semi-Structured Data from PDFs on a large scale☆52Updated 3 years ago
- A package to structure Australian addresses☆196Updated 3 years ago
- Python script to do PDF OCR conversion using Tesseract☆375Updated 2 years ago
- A free, open-source expert system for guided interviews and document assembly, based on Python, YAML, and Markdown.☆921Updated last week
- A Python tool to help extracting information from structured PDFs.☆427Updated 2 weeks ago
- CVparser is software for parsing or extracting data out of CV/resumes.☆42Updated 2 years ago
- Apache Tika Server with Tesseract 4 Docker Setup☆23Updated 4 years ago
- Easily build and maintain any kind of contract. Free and Open Source☆99Updated 8 years ago
- Python library and command line tool for parsing pdf bank statements☆67Updated last year
- Graphical User Interface for factur-x library with basic functionalities☆24Updated 6 years ago