LeoFCardoso/pdf2pdfocr

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/LeoFCardoso/pdf2pdfocr)

LeoFCardoso / pdf2pdfocr

A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!

☆303

Alternatives and similar repositories for pdf2pdfocr

Users that are interested in pdf2pdfocr are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ElectricRCAircraftGuy / PDF2SearchablePDF
View on GitHub
`pdf2searchablepdf input.pdf` = voila! "input_searchable.pdf" is created & now has searchable text!
☆137Aug 2, 2023Updated 2 years ago
writecrow / ocr2text
View on GitHub
Convert a PDF via OCR to a TXT file in UTF-8 encoding
☆160Oct 3, 2023Updated 2 years ago
OpaitSoftware / TesseractStudio.Net
View on GitHub
A free Windows graphical interface to the Tesseract 4.0 OCR engine.
☆61Feb 16, 2022Updated 4 years ago
seuretm / ocrd_typegroups_classifier
View on GitHub
☆10Mar 16, 2023Updated 3 years ago
mauvilsa / tesseract-recognize
View on GitHub
Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format
☆47Mar 31, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
virantha / pypdfocr
View on GitHub
Python script to do PDF OCR conversion using Tesseract
☆371Jun 2, 2023Updated 3 years ago
Open-Cap-Table-Coalition / OCF-Tools
View on GitHub
xState-based validation tool for OCF files
☆15Jun 27, 2026Updated last week
ocropus / hocr-tools
View on GitHub
Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.
☆415Aug 10, 2024Updated last year
deajan / pmOCR
View on GitHub
A wrapper for tesseract / abbyyOCR11 ocr4linux finereader cli that can perform batch operations or monitor a directory and launch an OCR …
☆67Jan 6, 2024Updated 2 years ago
dinosauria123 / gcv2hocr
View on GitHub
gcv2hocr converts from Google Cloud Vision OCR output to hocr to make a searchable pdf.
☆108Oct 22, 2020Updated 5 years ago
neelguha / legal-segmenter
View on GitHub
A simple library for segmenting legal texts
☆18Apr 22, 2023Updated 3 years ago
ocrmypdf / OCRmyPDF
View on GitHub
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
☆33,987Jun 27, 2026Updated last week
replicate / ideogram-inpainting-example-js
View on GitHub
A simple demo showing how to use the Ideogram inpainting model on Replicate using Node.js.
☆16Oct 24, 2024Updated last year
PublicI / pdf-gcv-ocr
View on GitHub
Tool to OCR PDFs using Google Cloud Vision
☆42Dec 7, 2022Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
ETCBC / dss
View on GitHub
Dead Sea Scrolls in TF format based on Abegg's data
☆31Apr 22, 2026Updated 2 months ago
dr-huv / discordEmojiGrabber
View on GitHub
Ever wanted to use custom discord emojis on other servers, without a nitro subscription? Well, with this script, YOU CAN without needing …
☆26Jan 8, 2021Updated 5 years ago
transfer-agent-protocol / tap-cap-table
View on GitHub
Onchain cap table management with an offchain SEC transfer agent-compliant DB.
☆16Jun 27, 2026Updated last week
cisocrgroup / pocoweb
View on GitHub
postcorrection web
☆12Mar 6, 2023Updated 3 years ago
benwiggy / QuartzFilters
View on GitHub
Quartz Filters for MacOS, providing transformations to PDF files.
☆16May 3, 2022Updated 4 years ago
tleyden / open-ocr-client
View on GitHub
Client library for OpenOCR
☆32Dec 3, 2014Updated 11 years ago
JohnAustinDev / osis-converters
View on GitHub
Automatically exported from code.google.com/p/osis-converters
☆13Jun 6, 2026Updated 3 weeks ago
tagattie / Unbound-DNSSEC-DNS-over-TLS
View on GitHub
Configuration files for Unbound as a caching DNS server with DNSSEC validation and DNS over TLS forwarding.
☆13Jan 13, 2019Updated 7 years ago
UB-Mannheim / ocr-fileformat
View on GitHub
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
☆204May 21, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Tenrec-Builders / marker-crop
View on GitHub
A set of tools for rotating, cropping, and binding the images from a scanned book into a PDF.
☆20Aug 15, 2018Updated 7 years ago
unpaper / unpaper
View on GitHub
A post-processing tool for scanned sheets of paper.
☆1,190Jul 11, 2024Updated last year
kspeeckaert / pyPdfCompare
View on GitHub
Visual, page-by-page comparison of two PDF files
☆21Apr 7, 2014Updated 12 years ago
cseas / ocr-table
View on GitHub
Extract tables from scanned image PDFs using Optical Character Recognition.
☆277Jun 9, 2020Updated 6 years ago
Open-Legal / LegalNLP-API
View on GitHub
NLP Web API for Legal Text
☆19Dec 23, 2022Updated 3 years ago
usfm-bible / tcdocs
View on GitHub
Technical Committee Documents
☆17Updated this week
hyperbox / client
View on GitHub
Hyperbox Client
☆13Dec 27, 2021Updated 4 years ago
mittagessen / kraken
View on GitHub
OCR engine for all the languages
☆1,022Jun 26, 2026Updated last week
jtauber / online-reader
View on GitHub
framework and tools for statically-generated and dynamic online reading environments
☆14Jun 12, 2017Updated 9 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
rescribe / bookpipeline
View on GitHub
Tools to process books in a cloud based pipeline system
☆64May 28, 2026Updated last month
tecosaur / pdftotext.el
View on GitHub
A mirror of https://git.tecosaur.net/tec/pdftotext.el
☆12Jan 4, 2024Updated 2 years ago
vivo-archive / Tools
View on GitHub
Code for several utilities for use with VIVO
☆11Nov 15, 2012Updated 13 years ago
tanaikech / ProjectApp
View on GitHub
This is a project library for Google Apps Script (GAS).
☆12Jan 29, 2018Updated 8 years ago
dot-legal / reference
View on GitHub
Write beautifully short contract. https://reference.legal/ is a referenceable clause library to standardize contracts once and for all.
☆13Jul 12, 2022Updated 3 years ago
lastlink / realworld-apps-script
View on GitHub
jwt rest api using realworld spec and google apps script
☆14Jan 5, 2023Updated 3 years ago
charlesLoder / hebrewTransliteration
View on GitHub
A web app for transliterating Hebrew
☆18Jun 24, 2026Updated last week