deajan / pmOCRLinks
A wrapper for tesseract / abbyyOCR11 ocr4linux finereader cli that can perform batch operations or monitor a directory and launch an OCR conversion on file activity
☆67Updated last year
Alternatives and similar repositories for pmOCR
Users that are interested in pmOCR are comparing it to the libraries listed below
Sorting:
- A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!☆299Updated 7 months ago
- Convert a PDF via OCR to a TXT file in UTF-8 encoding☆154Updated 2 years ago
- A free Windows graphical interface to the Tesseract 4.0 OCR engine.☆61Updated 3 years ago
- Short script for removing watermarks from PDF files. Requires pdftk.☆59Updated 6 years ago
- web interface for recoll desktop search☆292Updated 5 years ago
- Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format☆46Updated 9 months ago
- Tesseract Powered Windows Desktop OCR Application With Multiple Pre/Post Processing GUI☆41Updated last year
- A post-processing tool for scanned sheets of paper.☆85Updated last year
- Textricator is a tool to extract text from documents and generate structured data.☆350Updated 9 months ago
- Export / upload emails from Thunderbird mbox files to single eml files☆23Updated 2 years ago
- 📑 Scripts to repair, verify, OCR, compress, wrangle, crop (etc.) PDFs☆70Updated last year
- ReadablePDF streamlines the effort of turning a not so great PDF into a more easily readable PDF (or of course a pretty decent PDF into a…☆33Updated 4 years ago
- Frontend part i.e. web-based user interface of Papermerge Document Management System☆37Updated 2 years ago
- Tool to OCR PDFs using Google Cloud Vision☆42Updated 3 years ago
- Make your PDF files text-searchable (A GUI for OCRmyPDF)☆50Updated last year
- A small framework taking over the manual training process described in the Tesseract3 Wiki: https://code.google.com/p/tesseract-ocr/wiki/…☆131Updated 2 years ago
- Prepress preparing tool and PDF editor☆19Updated 2 years ago
- Python script to do PDF OCR conversion using Tesseract☆376Updated 2 years ago
- Data Generator for Training Tesseract OCR☆10Updated 5 years ago
- Very simple file search web interface with a locate / mlocate backend☆25Updated 11 months ago
- A chrome extension for automatically save the visited pages and the downloaded URLs in your bookmarks.☆16Updated 9 years ago
- Juris-M is a variant of the free and friendly Zotero research platform, with support for legal and multilingual materials.☆88Updated 2 months ago
- PDF to XML ALTO file converter☆258Updated last month
- take scanned image, and hocr output from tesseract, create PDF. Thats it.☆27Updated 2 years ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆54Updated last month
- Ergonomic line-by-line transcription of scanned text.☆54Updated 5 years ago
- A Chrome/Opera extension for scrolling a lazy-loading content on any website☆21Updated 4 years ago
- rsyncd installer with cygwin for windows backup clients☆64Updated 4 years ago
- Tool to index and serve HTML files. Powered by Datasette.☆110Updated 3 years ago
- 💡✏️️ ⬇️️ JSON to Markdown converter - Generate Markdown from format independent JSON☆78Updated 6 years ago