Python script to do PDF OCR conversion using Tesseract
☆371Jun 2, 2023Updated 2 years ago
Alternatives and similar repositories for pypdfocr
Users that are interested in pypdfocr are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- (Python) Execute tesseract OCR on a multi-page PDF.☆19Jun 30, 2023Updated 2 years ago
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆411Aug 10, 2024Updated last year
- A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.☆2,256Jun 24, 2022Updated 3 years ago
- OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched☆33,636May 12, 2026Updated last week
- This is about my idea of Knowledge-Graph-Building-Blocks as building blocks for knowledge graph applications.☆14Nov 23, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- OneResumé is a data-driven resumé generator for text and Microsoft Word documents.☆14Apr 14, 2015Updated 11 years ago
- The computer language for describing species and phenotypes☆15Updated this week
- TaxonWorks (https://taxonworks.org) documentation.☆14May 13, 2026Updated last week
- A LDP Implementation backed by BlazeGraph☆26Oct 31, 2017Updated 8 years ago
- ☆10Jul 15, 2019Updated 6 years ago
- ARCHIVED: A Python API for Tesseract☆20Jul 25, 2017Updated 8 years ago
- Python and R: Writing Cross Language Tools (SciPy 2016 Talk)☆20Jul 13, 2016Updated 9 years ago
- Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.☆1,279Dec 1, 2020Updated 5 years ago
- This project is no longer supported. A pre-configured collection of tools including Social Feed Manager and Lentil for easily building Tw…☆16Feb 9, 2018Updated 8 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- BaseX Distribution Files☆22Oct 22, 2025Updated 6 months ago
- Search engine base (crawler, indexer and parser) using Python, Celery, RabbitMQ, CouchDB and Whoosh.☆10Jun 10, 2025Updated 11 months ago
- A Python wrapper for Tesseract and Cuneiform -- Moved to Gnome's Gitlab☆929Jun 13, 2018Updated 7 years ago
- Scan, index, and archive all of your paper documents☆7,920Apr 6, 2021Updated 5 years ago
- Python bindings to the Tesseract API☆66Jul 5, 2016Updated 9 years ago
- Image comparison QA tool for digital preservation workflows.☆14Nov 17, 2014Updated 11 years ago
- Python: Bad Ideas☆11May 29, 2017Updated 8 years ago
- Python PDF Parser (Not actively maintained). Check out pdfminer.six.☆5,292Dec 7, 2022Updated 3 years ago
- Document Imaging Archive System. Home document imaging, with OCR. Scan documents (with SANE) or import ODF documents, assign tags. Use op…☆25Jul 5, 2015Updated 10 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Amazing language representation☆79Dec 11, 2014Updated 11 years ago
- A canonical Sinatra Pastie-like application running on RethinkDB☆41May 6, 2022Updated 4 years ago
- Euler is an open source logic toolkit for aligning taxonomies and visualizing the results; see http://sysbio.oxfordjournals.org/cgi/repri…☆21Jun 14, 2023Updated 2 years ago
- Documentation for Project Electron☆14Dec 2, 2024Updated last year
- World Catalog of Ants☆20Jan 15, 2026Updated 4 months ago
- Managing text data from the commandline☆14Feb 25, 2017Updated 9 years ago
- A middleware for RPC and Pub/Sub communication styles☆24Apr 20, 2026Updated last month
- Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.☆1,653May 7, 2026Updated 2 weeks ago
- Goobi workflow - Workflow management software for digitisation projects used in more than 80 cultural heritage institutions in at least 1…☆63May 13, 2026Updated last week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Dot notation object for Python☆11Apr 13, 2026Updated last month
- A simple viewer and inspection tool for text boxes in PDF documents☆96Mar 7, 2022Updated 4 years ago
- A zero-shot captcha solver.☆16Dec 22, 2023Updated 2 years ago
- CI scripts for validating and processing metadata☆11Dec 7, 2019Updated 6 years ago
- rdiv!(::AbstractMatrix, ::UpperTriangular) and ldiv!(::LowerTriangular, ::AbstractMatrix)☆12Nov 18, 2024Updated last year
- This software (prototype) extracts values of Excel spreadsheet properties and calculates a tentative spreadsheet complexity assessment ba…☆13Dec 13, 2022Updated 3 years ago
- Trusty URI specification☆21Feb 23, 2015Updated 11 years ago