Python script to do PDF OCR conversion using Tesseract
☆375Jun 2, 2023Updated 2 years ago
Alternatives and similar repositories for pypdfocr
Users that are interested in pypdfocr are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- (Python) Execute tesseract OCR on a multi-page PDF.☆19Jun 30, 2023Updated 2 years ago
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆411Aug 10, 2024Updated last year
- Node.js implementation of the PirateBox Server inspired by David Darts☆15Dec 8, 2015Updated 10 years ago
- A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.☆2,258Jun 24, 2022Updated 3 years ago
- OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched☆33,120Updated this week
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- gathering point for open source OCR scripts and diffs☆43Jun 27, 2014Updated 11 years ago
- Anonymize sensitive data in a controlled, pseudo-random way☆14Dec 18, 2015Updated 10 years ago
- OneResumé is a data-driven resumé generator for text and Microsoft Word documents.☆14Apr 14, 2015Updated 10 years ago
- A calagator app built in React Native. It works on both iOS and Android☆14Apr 24, 2016Updated 9 years ago
- PyTorch library for synthesizing programs from natural language☆18Jul 25, 2024Updated last year
- Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.☆1,280Dec 1, 2020Updated 5 years ago
- This project is no longer supported. A pre-configured collection of tools including Social Feed Manager and Lentil for easily building Tw…☆16Feb 9, 2018Updated 8 years ago
- BaseX Distribution Files☆22Oct 22, 2025Updated 5 months ago
- A Python wrapper for Tesseract and Cuneiform -- Moved to Gnome's Gitlab☆931Jun 13, 2018Updated 7 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Python bindings to the Tesseract API☆66Jul 5, 2016Updated 9 years ago
- Models, vocabularies and behaviours for Hyrax applications.☆11Sep 21, 2023Updated 2 years ago
- Image comparison QA tool for digital preservation workflows.☆14Nov 17, 2014Updated 11 years ago
- Python: Bad Ideas☆11May 29, 2017Updated 8 years ago
- Python PDF Parser (Not actively maintained). Check out pdfminer.six.☆5,300Dec 7, 2022Updated 3 years ago
- Harvard University Library Cloud API☆11Feb 25, 2022Updated 4 years ago
- Pyfilesystem2 implementation for OneDrive☆10Mar 20, 2026Updated 2 weeks ago
- convert Chinese to PinYin☆10Feb 21, 2017Updated 9 years ago
- Divmod Axiom is an object database, or alternatively, an object-relational mapper, implemented on top of Python.☆25Jan 13, 2023Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Managing text data from the commandline☆14Feb 25, 2017Updated 9 years ago
- Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.☆1,653Mar 28, 2026Updated last week
- 8-bit raspberry pi game☆14Jan 19, 2017Updated 9 years ago
- An sshfs profile management script written in bash☆23Nov 23, 2022Updated 3 years ago
- Python wrapper for xpdf☆19Nov 28, 2019Updated 6 years ago
- 💬 A small event handling library on top of the Slack RTM API.☆15Jan 12, 2020Updated 6 years ago
- CodFS: An Erasure-Coded Clustered Storage System for Efficient Updates and Recovery☆10Mar 31, 2015Updated 11 years ago
- A zero-shot captcha solver.☆16Dec 22, 2023Updated 2 years ago
- A comprehensive Python client library for the Kroger Public API, featuring robust token management, comprehensive examples, and easy-to-u…☆20Aug 30, 2025Updated 7 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- This software (prototype) extracts values of Excel spreadsheet properties and calculates a tentative spreadsheet complexity assessment ba…☆13Dec 13, 2022Updated 3 years ago
- Trusty URI specification☆20Feb 23, 2015Updated 11 years ago
- ☆13Nov 11, 2025Updated 4 months ago
- Bootstrap is an Ansible playbook that you can use to set up and immediately secure a brand new server, such as a fresh Linode. This playb…☆24Aug 10, 2014Updated 11 years ago
- Manifests of the public domain images uploaded to Flickr Commons, with descriptive information about the books they were taken from.☆75Apr 11, 2014Updated 11 years ago
- The Next-Generation Architecture for Format-Aware Characterization.☆13Jan 27, 2022Updated 4 years ago
- Host server monitoring app for Django Admin. Allows to schedule checks on hosts and notify results to administrators by mail.☆66Nov 6, 2014Updated 11 years ago