deajan / pmOCR
A wrapper for tesseract / abbyyOCR11 ocr4linux finereader cli that can perform batch operations or monitor a directory and launch an OCR conversion on file activity
☆65Updated last year
Alternatives and similar repositories for pmOCR:
Users that are interested in pmOCR are comparing it to the libraries listed below
- Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format☆46Updated 10 months ago
- ReadablePDF streamlines the effort of turning a not so great PDF into a more easily readable PDF (or of course a pretty decent PDF into a…☆33Updated 3 years ago
- Ergonomic line-by-line transcription of scanned text.☆51Updated 4 years ago
- Very simple file search web interface with a locate / mlocate backend☆25Updated 2 months ago
- smoothscan is a tool to convert scanned text into a vectorized output form.☆67Updated 11 years ago
- Short script for removing watermarks from PDF files. Requires pdftk.☆58Updated 6 years ago
- A tiny frontend for OCRing PDF files via the web.☆46Updated 5 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Updated 7 years ago
- TagSpaces Web Clipper for Chrome and Firefox☆41Updated 2 months ago
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆187Updated last month
- Extract meaningful content from pdf and psd file, such as texts and images both linked into a common JSON string☆37Updated 7 years ago
- A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!☆280Updated last year
- An expandable and scalable OCR pipeline☆87Updated 7 years ago
- PAGE XML format collection for document image page content and more☆67Updated 3 years ago
- A post-processing tool for scanned sheets of paper.☆79Updated 11 months ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆39Updated last week
- Conversions between various OCR formats☆74Updated last year
- ES Local Indexer - Desktop search powered by Elasticsearch☆27Updated 5 years ago
- The CIS OCR PostCorrectionTool☆41Updated 2 years ago
- Efficient hOCR tooling☆42Updated 2 weeks ago
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆386Updated 6 months ago
- An online annotation platform for teaching and learning in the humanities.☆107Updated 3 weeks ago
- Portable Batch environment☆13Updated 7 years ago
- Scripts and results from our OCR roundup, available on Source☆150Updated 6 years ago
- take scanned image, and hocr output from tesseract, create PDF. Thats it.☆25Updated last year
- A free Windows graphical interface to the Tesseract 4.0 OCR engine.☆58Updated 3 years ago
- PDF to XML ALTO file converter☆231Updated this week
- A browser extension providing Open Access bibliographical services☆17Updated 2 years ago
- web interface for recoll desktop search☆281Updated 4 years ago
- Export / upload emails from Thunderbird mbox files to single eml files☆22Updated last year