eloops / hocr2pdfLinks
take scanned image, and hocr output from tesseract, create PDF. Thats it.
☆27Updated 2 years ago
Alternatives and similar repositories for hocr2pdf
Users that are interested in hocr2pdf are comparing it to the libraries listed below
Sorting:
- `pdf2searchablepdf input.pdf` = voila! "input_searchable.pdf" is created & now has searchable text!☆137Updated 2 years ago
- A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!☆303Updated 8 months ago
- Scripts and results from our OCR roundup, available on Source☆150Updated 6 years ago
- Read-only mirror of https://gitlab.gnome.org/GNOME/ocrfeeder☆91Updated 4 months ago
- Number to number name and money text conversion libraries in C++, Java, JavaScript and Python & LibreOffice Calc Extension☆75Updated last year
- resource scheduling and event planing☆67Updated 2 weeks ago
- A wrapper for tesseract / abbyyOCR11 ocr4linux finereader cli that can perform batch operations or monitor a directory and launch an OCR …☆67Updated 2 years ago
- Live SQLite3 database master-slave replication with sqlite3-rdiff using rsync over SSH☆40Updated 9 years ago
- Textricator is a tool to extract text from documents and generate structured data.☆351Updated 10 months ago
- web interface for recoll desktop search☆293Updated 5 years ago
- PortableSigner - A Commandline and GUI Tool to digital sign PDF files with X.509 certificates☆123Updated 6 years ago
- Extract structured data from PDF invoices☆14Updated 4 years ago
- Audio & Video chat for Etherpad - Video Conferencing with a focus on collaboration☆75Updated 3 months ago
- Batch convert PDF files to text under Windows, using several text extraction methods or OCR☆35Updated 10 years ago
- This is the core libferris repository. It is the primary tree for development as at 2015.☆24Updated 3 weeks ago
- Older code for Bibledit☆10Updated 8 years ago
- Core server of the SEPIA Framework responsible for NLU, conversation, smart-service integration, user-accounts and more.☆100Updated 2 years ago
- Minstrel is a FLOSS hybrid reading app specifically designed for Audio-eBooks☆98Updated 9 years ago
- Tools to process books in a cloud based pipeline system☆65Updated 2 months ago
- A post-processing tool for scanned sheets of paper.☆1,151Updated last year
- User contributed (non Google) OCR models for Tesseract☆30Updated 9 months ago
- Crop And Splice Segments (of scanned pages)☆14Updated 6 years ago
- Pre-Recognize Library - library with algorithms for improving OCR quality.☆110Updated 2 years ago
- Training scripts for Argos Translate☆154Updated 2 weeks ago
- ☆140Updated 3 years ago
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆328Updated 2 years ago
- Ergonomic line-by-line transcription of scanned text.☆54Updated this week
- Qbix Platform for powering Social Apps (http://qbix.com/platform)☆93Updated last year
- All Apertium language pairs, modules, tools and core☆70Updated 4 years ago
- Industry supported, open source PDF/A validation library☆315Updated this week