Python script to do PDF OCR conversion using Tesseract
☆372Jun 2, 2023Updated 2 years ago
Alternatives and similar repositories for pypdfocr
Users that are interested in pypdfocr are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.☆2,257Jun 24, 2022Updated 3 years ago
- Extract tables from scanned image PDFs using Optical Character Recognition.☆277Jun 9, 2020Updated 5 years ago
- Physical unit systems (Metric, English, Natural, Planck, etc...)☆20Sep 24, 2025Updated 7 months ago
- Web-based page layout editor created for EMOP (Early Modern OCR Project).☆11May 21, 2021Updated 4 years ago
- OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched☆33,447Updated this week
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A package of code for quickly and easily annotating videos in a web browser☆22Apr 17, 2012Updated 14 years ago
- OneResumé is a data-driven resumé generator for text and Microsoft Word documents.☆14Apr 14, 2015Updated 11 years ago
- Python-based tools for document analysis and OCR☆3,470May 22, 2021Updated 4 years ago
- A LDP Implementation backed by BlazeGraph☆26Oct 31, 2017Updated 8 years ago
- ☆10Jul 15, 2019Updated 6 years ago
- Kylie is a blond and small Elixir client for Cayley graph data base☆12Apr 17, 2026Updated 2 weeks ago
- ARCHIVED: A Python API for Tesseract☆20Jul 25, 2017Updated 8 years ago
- Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.☆1,279Dec 1, 2020Updated 5 years ago
- Symfony bundle for defining menu-structures and breadcrumbs in the KnpMenuBundle (DISCONTINUED)☆12Feb 3, 2019Updated 7 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Extracts locations from text☆15May 8, 2020Updated 5 years ago
- This project is no longer supported. A pre-configured collection of tools including Social Feed Manager and Lentil for easily building Tw…☆16Feb 9, 2018Updated 8 years ago
- BaseX Distribution Files☆22Oct 22, 2025Updated 6 months ago
- Search engine base (crawler, indexer and parser) using Python, Celery, RabbitMQ, CouchDB and Whoosh.☆10Jun 10, 2025Updated 10 months ago
- A Python wrapper for Tesseract and Cuneiform -- Moved to Gnome's Gitlab☆929Jun 13, 2018Updated 7 years ago
- Scan, index, and archive all of your paper documents☆7,923Apr 6, 2021Updated 5 years ago
- Python bindings to the Tesseract API☆66Jul 5, 2016Updated 9 years ago
- A small programm using libpoppler-gtk that searches for regular expressions in pdf files☆23Mar 12, 2014Updated 12 years ago
- Image comparison QA tool for digital preservation workflows.☆14Nov 17, 2014Updated 11 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Python PDF Parser (Not actively maintained). Check out pdfminer.six.☆5,295Dec 7, 2022Updated 3 years ago
- Document Imaging Archive System. Home document imaging, with OCR. Scan documents (with SANE) or import ODF documents, assign tags. Use op…☆25Jul 5, 2015Updated 10 years ago
- Harvard University Library Cloud API☆11Feb 25, 2022Updated 4 years ago
- Divmod Axiom is an object database, or alternatively, an object-relational mapper, implemented on top of Python.☆25Jan 13, 2023Updated 3 years ago
- Documentation for Project Electron☆14Dec 2, 2024Updated last year
- Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.☆1,654Updated this week
- 8-bit raspberry pi game☆14Jan 19, 2017Updated 9 years ago
- Python wrapper for xpdf☆19Nov 28, 2019Updated 6 years ago
- A simple viewer and inspection tool for text boxes in PDF documents☆96Mar 7, 2022Updated 4 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- SISTEMA DE ORÇAMENTO EM DJANGO 3 E CALCULANDO AUTOMÁTICO☆15Aug 31, 2023Updated 2 years ago
- CI scripts for validating and processing metadata☆11Dec 7, 2019Updated 6 years ago
- django CMS Icon adds capabilities to implement Font or SVG icons as plugins into your project.☆19Apr 14, 2026Updated 2 weeks ago
- This software (prototype) extracts values of Excel spreadsheet properties and calculates a tentative spreadsheet complexity assessment ba…☆13Dec 13, 2022Updated 3 years ago
- Trusty URI specification☆21Feb 23, 2015Updated 11 years ago
- A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files☆9,967Updated this week
- Turpenscape allows designers to create Inkscape (and Gimp!) palettes from an image or an URL☆14Feb 5, 2017Updated 9 years ago