jsoma / kullLinks
A tool to interactively select text regions of PDFs and images. Mostly for use with PDFQuery or tesseract (UZN/OCR zone files)
☆53Updated 7 years ago
Alternatives and similar repositories for kull
Users that are interested in kull are comparing it to the libraries listed below
Sorting:
- Simplify using uzn files with tesseract for OCR☆18Updated 2 years ago
- Scripts and results from our OCR roundup, available on Source☆150Updated 6 years ago
- A fast and friendly PDF scraping library.☆776Updated last year
- Extract tables from PDF pages.☆291Updated 4 years ago
- Working with hOCR in Javascript☆129Updated 2 years ago
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆392Updated 9 months ago
- A simple viewer and inspection tool for text boxes in PDF documents☆95Updated 3 years ago
- Python interface to Apache PDFBox command-line tools.☆75Updated 2 years ago
- Get semantic HTML from PDFs, recover lost text, tables, data... in bulk.☆31Updated 6 months ago
- A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!☆288Updated last week
- Locate and extract tables and figures in PDFs☆42Updated 4 years ago
- take scanned image, and hocr output from tesseract, create PDF. Thats it.☆25Updated 2 years ago
- Web based JavaScript GUI library for proofreading/editing hOCR☆95Updated 6 years ago
- A tiny frontend for OCRing PDF files via the web.☆49Updated 5 years ago
- Extract structured data from PDF invoices☆14Updated 4 years ago
- HOCR Specification Python Parser☆13Updated 9 years ago
- An expandable and scalable OCR pipeline☆87Updated 7 years ago
- Pure-python library for adding annotations to PDFs☆202Updated 4 years ago
- a machine learning implementation of OCR☆96Updated 2 years ago
- Structured Data from PDF image-based files☆88Updated 12 years ago
- Python library to extract tabular data from images and scanned PDFs☆278Updated 10 months ago
- Deep learning model for OCR of document fields☆36Updated 8 years ago
- Web interface for recognizing text, proofreading OCR, and creating fully-digitized documents.☆180Updated last week
- A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.☆186Updated last week
- Extracting addresses from text☆42Updated 7 years ago
- Extract structured data from PDF invoices☆1,981Updated 2 weeks ago
- gcv2hocr converts from Google Cloud Vision OCR output to hocr to make a searchable pdf.☆106Updated 4 years ago
- A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.☆448Updated last year
- A wrapper for tesseract / abbyyOCR11 ocr4linux finereader cli that can perform batch operations or monitor a directory and launch an OCR …☆65Updated last year
- The hOCR Embedded OCR Workflow and Output Format☆74Updated 9 months ago