deajan / pmOCRLinks
A wrapper for tesseract / abbyyOCR11 ocr4linux finereader cli that can perform batch operations or monitor a directory and launch an OCR conversion on file activity
☆66Updated last year
Alternatives and similar repositories for pmOCR
Users that are interested in pmOCR are comparing it to the libraries listed below
Sorting:
- Convert a PDF via OCR to a TXT file in UTF-8 encoding☆154Updated 2 years ago
- A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!☆299Updated 6 months ago
- ReadablePDF streamlines the effort of turning a not so great PDF into a more easily readable PDF (or of course a pretty decent PDF into a…☆33Updated 4 years ago
- Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format☆46Updated 8 months ago
- Short script for removing watermarks from PDF files. Requires pdftk.☆59Updated 6 years ago
- Tesseract Powered Windows Desktop OCR Application With Multiple Pre/Post Processing GUI☆41Updated last year
- Tool to OCR PDFs using Google Cloud Vision☆42Updated 2 years ago
- web interface for recoll desktop search☆291Updated 5 years ago
- Juris-M is a variant of the free and friendly Zotero research platform, with support for legal and multilingual materials.☆87Updated last month
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆196Updated 6 months ago
- Building scantailor and its dependencies☆64Updated 2 years ago
- Batch convert PDF files to text under Windows, using several text extraction methods or OCR☆35Updated 10 years ago
- 📑 Scripts to repair, verify, OCR, compress, wrangle, crop (etc.) PDFs☆70Updated last year
- Export / upload emails from Thunderbird mbox files to single eml files☆23Updated 2 years ago
- smoothscan is a tool to convert scanned text into a vectorized output form.☆67Updated 12 years ago
- BulkPDF is a free and easy to use open source software, which allows to automatically fill an existing PDF form with differen values. Onl…☆133Updated last year
- Tool to index and serve HTML files. Powered by Datasette.☆109Updated 3 years ago
- QtSemanticNotes is a personal knowledge base, personal wiki or just note taking application that features automatic linking, tree view an…☆18Updated 7 years ago
- A tiny frontend for OCRing PDF files via the web.☆51Updated 5 years ago
- PageArchiver (previously called "Scrapbook for SingleFile") is a Chrome extension that helps to archive pages for offline reading☆90Updated 12 years ago
- Recoll Full Text Search Plugin for Calibre☆25Updated 4 years ago
- A post-processing tool for scanned sheets of paper.☆85Updated last year
- A chrome extension for automatically save the visited pages and the downloaded URLs in your bookmarks.☆16Updated 9 years ago
- A dynamic media input form developed for oTranscribe☆18Updated 10 years ago
- GUI Wrapper for Pandoc (Crossplatform)☆42Updated last year
- `pdf2searchablepdf input.pdf` = voila! "input_searchable.pdf" is created & now has searchable text!☆135Updated 2 years ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆52Updated this week
- Zotero Word for Windows integration☆56Updated 3 months ago
- Ergonomic line-by-line transcription of scanned text.☆54Updated 4 years ago
- Tool for visualizing hOCR output from Tesseract (or other OCR engines that support hOCR).☆25Updated 10 years ago