deajan / pmOCR
A wrapper for tesseract / abbyyOCR11 ocr4linux finereader cli that can perform batch operations or monitor a directory and launch an OCR conversion on file activity
☆65Updated 8 months ago
Related projects: ⓘ
- A tiny frontend for OCRing PDF files via the web.☆45Updated 4 years ago
- Short script for removing watermarks from PDF files. Requires pdftk.☆57Updated 5 years ago
- Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format☆44Updated 5 months ago
- Ergonomic line-by-line transcription of scanned text.☆47Updated 3 years ago
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆176Updated last month
- OCR for DjVu☆44Updated last year
- ReadablePDF streamlines the effort of turning a not so great PDF into a more easily readable PDF (or of course a pretty decent PDF into a…☆33Updated 3 years ago
- Read-only mirror of https://gitlab.gnome.org/GNOME/ocrfeeder☆86Updated 3 months ago
- Building scantailor and its dependencies☆54Updated last year
- Fast PDF generation and compression. Deals with millions of pages daily.☆97Updated last month
- take scanned image, and hocr output from tesseract, create PDF. Thats it.☆23Updated last year
- web interface for recoll desktop search☆268Updated 4 years ago
- smoothscan is a tool to convert scanned text into a vectorized output form.☆67Updated 10 years ago
- Efficient hOCR tooling☆38Updated last week
- A tiny, hackable, two-way cloud synchronisation client for Linux☆53Updated 4 years ago
- A free Windows graphical interface to the Tesseract 4.0 OCR engine.☆54Updated 2 years ago
- PDF to DjVu converter☆92Updated 8 months ago
- A post-processing tool for scanned sheets of paper.☆69Updated 6 months ago
- The hOCR Embedded OCR Workflow and Output Format☆72Updated last month
- Master repository which includes most other OCR-D repositories as submodules☆71Updated last month
- Tool to OCR PDFs using Google Cloud Vision☆38Updated last year
- ☆10Updated last week
- Automatic de-keystoning for single camera DIY book scanners.☆47Updated 4 years ago
- Make your PDF files text-searchable (A GUI for OCRmyPDF)☆30Updated 2 months ago
- Nextcloud OCR (optical character recoginition) processing for images with tesseract-js☆107Updated 2 weeks ago
- User contributed (non Google) OCR models for Tesseract☆19Updated last year
- Reads HTML files, converting tables into CSV files☆31Updated 4 years ago
- PdfJs-Annotator is a proof of concept project that integrates AnnotatorJs (http://annotatorjs.org/) with the PdfJs (https://mozilla.githu…☆22Updated 4 years ago
- A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!☆266Updated 7 months ago
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆363Updated last month