deajan / pmOCR
A wrapper for tesseract / abbyyOCR11 ocr4linux finereader cli that can perform batch operations or monitor a directory and launch an OCR conversion on file activity
☆65Updated last year
Alternatives and similar repositories for pmOCR:
Users that are interested in pmOCR are comparing it to the libraries listed below
- Short script for removing watermarks from PDF files. Requires pdftk.☆58Updated 6 years ago
- A tiny frontend for OCRing PDF files via the web.☆46Updated 5 years ago
- Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format☆46Updated this week
- A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!☆282Updated last year
- web interface for recoll desktop search☆284Updated 4 years ago
- 📑 Scripts to repair, verify, OCR, compress, wrangle, crop (etc.) PDFs☆68Updated 10 months ago
- A free Windows graphical interface to the Tesseract 4.0 OCR engine.☆58Updated 3 years ago
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆187Updated last month
- Convert a PDF via OCR to a TXT file in UTF-8 encoding☆148Updated last year
- TagSpaces Web Clipper for Chrome and Firefox☆42Updated 2 months ago
- generate clean readable PDFs from web-articles☆30Updated last year
- A post-processing tool for scanned sheets of paper.☆80Updated last year
- OCR for DjVu☆48Updated 2 years ago
- A repository for LogicalDOC DMS - Community Edition - Docker image https://www.logicaldoc.com/download-logicaldoc-community☆34Updated 3 years ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆39Updated this week
- LibGen☆16Updated 11 years ago
- Ergonomic line-by-line transcription of scanned text.☆51Updated 4 years ago
- QtSemanticNotes is a personal knowledge base, personal wiki or just note taking application that features automatic linking, tree view an…☆18Updated 7 years ago
- CLI implementation of httpreserve that can test links and retrieve internet archive replacements☆10Updated 4 months ago
- Web interface for EveryDocs (https://github.com/jonashellmann/everydocs-core)☆12Updated 3 months ago
- PDF minifier that allows removing duplicate data, re-compresses images, creation of PDF/A-1b and digital PDF signing☆55Updated 6 months ago
- A dockerized, queued high fidelity web archiver based on Squidwarc☆58Updated 8 months ago
- A dead simple web-clipper | ✂Capture ⇒ ⊞ Select ⇒ ✔Done☆32Updated 7 years ago
- WIP tag-based file organizer & search☆39Updated last year
- SingleFile docker implementation providing access via CLI and WEB service☆44Updated 9 months ago
- Tool to OCR PDFs using Google Cloud Vision☆41Updated 2 years ago
- Fess Site Search provides JavaScript files.☆23Updated last week
- Performance comparisons of cloud backup storages as Duplicacy backends☆136Updated 5 years ago
- Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.☆18Updated last year
- A set of tools for working with JSON, CSV and Excel workbooks☆78Updated 2 months ago