A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!
☆303May 25, 2025Updated 11 months ago
Alternatives and similar repositories for pdf2pdfocr
Users that are interested in pdf2pdfocr are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- `pdf2searchablepdf input.pdf` = voila! "input_searchable.pdf" is created & now has searchable text!☆137Aug 2, 2023Updated 2 years ago
- ☆10Mar 16, 2023Updated 3 years ago
- Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format☆47Mar 31, 2025Updated last year
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆411Aug 10, 2024Updated last year
- A wrapper for tesseract / abbyyOCR11 ocr4linux finereader cli that can perform batch operations or monitor a directory and launch an OCR …☆67Jan 6, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- SciViews R socket server☆12Aug 29, 2025Updated 8 months ago
- A simple library for segmenting legal texts☆18Apr 22, 2023Updated 3 years ago
- OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched☆33,447Updated this week
- Dead Sea Scrolls in TF format based on Abegg's data☆28Apr 22, 2026Updated last week
- Tool to OCR PDFs using Google Cloud Vision☆42Dec 7, 2022Updated 3 years ago
- Ever wanted to use custom discord emojis on other servers, without a nitro subscription? Well, with this script, YOU CAN without needing …☆25Jan 8, 2021Updated 5 years ago
- Onchain cap table management with an offchain SEC transfer agent-compliant DB.☆16Apr 12, 2026Updated 2 weeks ago
- postcorrection web☆12Mar 6, 2023Updated 3 years ago
- Configuration files for Unbound as a caching DNS server with DNSSEC validation and DNS over TLS forwarding.☆13Jan 13, 2019Updated 7 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆202May 21, 2025Updated 11 months ago
- Chambua is an open-source semantic tagging application that analyses text and extracts names of people, places (& geocodes them), organis…☆33Nov 12, 2021Updated 4 years ago
- Extract palette from an image☆15Nov 20, 2022Updated 3 years ago
- A post-processing tool for scanned sheets of paper.☆1,173Jul 11, 2024Updated last year
- Visual, page-by-page comparison of two PDF files☆21Apr 7, 2014Updated 12 years ago
- Easily work with .docx files from Clojure (a wrapper on Apache POI library).☆12Sep 4, 2019Updated 6 years ago
- Technical Committee Documents☆16Apr 20, 2026Updated last week
- NLP Web API for Legal Text☆18Dec 23, 2022Updated 3 years ago
- framework and tools for statically-generated and dynamic online reading environments☆14Jun 12, 2017Updated 8 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Tools to process books in a cloud based pipeline system☆65Apr 16, 2026Updated 2 weeks ago
- Run tesseract with the tesserocr bindings with @OCR-D's interfaces☆39Apr 30, 2025Updated 11 months ago
- A mirror of https://git.tecosaur.net/tec/pdftotext.el☆12Jan 4, 2024Updated 2 years ago
- Internet Chess ToolKit is a java based set of libraries and widgets useful for performing common tasks such as reading PGN, FEN, and gene…☆12Feb 22, 2017Updated 9 years ago
- guides and test data for OCR4all☆32Oct 4, 2022Updated 3 years ago
- Write beautifully short contract. https://reference.legal/ is a referenceable clause library to standardize contracts once and for all.☆13Jul 12, 2022Updated 3 years ago
- A web app for transliterating Hebrew☆18Apr 21, 2026Updated last week
- jwt rest api using realworld spec and google apps script☆14Jan 5, 2023Updated 3 years ago
- OCR-D python tools☆33Aug 16, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A text annotation plugin for Protege 5+☆18Mar 10, 2026Updated last month
- API endpoints for the BibleGet I/O project☆14Updated this week
- A post-processing tool for scanned sheets of paper.☆86Mar 9, 2024Updated 2 years ago
- A MIDI filter that uses the ALSA MIDI libraries to provide configurable filtering functionality☆11Mar 1, 2017Updated 9 years ago
- API client for fetching and comparing passages from legislation☆14Jan 26, 2025Updated last year
- A miniature version of the l4 language☆13Jun 29, 2025Updated 10 months ago
- Docker containers on a Tailnet☆21Nov 20, 2022Updated 3 years ago