Python script to do PDF OCR conversion using Tesseract
☆375Jun 2, 2023Updated 2 years ago
Alternatives and similar repositories for pypdfocr
Users that are interested in pypdfocr are comparing it to the libraries listed below
Sorting:
- (Python) Execute tesseract OCR on a multi-page PDF.☆19Jun 30, 2023Updated 2 years ago
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆409Aug 10, 2024Updated last year
- A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!☆303May 25, 2025Updated 9 months ago
- A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.☆2,257Jun 24, 2022Updated 3 years ago
- Web-based page layout editor created for EMOP (Early Modern OCR Project).☆11May 21, 2021Updated 4 years ago
- gathering point for open source OCR scripts and diffs☆43Jun 27, 2014Updated 11 years ago
- A package of code for quickly and easily annotating videos in a web browser☆22Apr 17, 2012Updated 13 years ago
- A calagator app built in React Native. It works on both iOS and Android☆14Apr 24, 2016Updated 9 years ago
- Python-based tools for document analysis and OCR☆3,472May 22, 2021Updated 4 years ago
- PyTorch library for synthesizing programs from natural language☆18Jul 25, 2024Updated last year
- A LDP Implementation backed by BlazeGraph☆26Oct 31, 2017Updated 8 years ago
- Examples to implement OCR(Optical Character Recognition) using tesseract using Python☆64Jun 4, 2023Updated 2 years ago
- This project is no longer supported. A pre-configured collection of tools including Social Feed Manager and Lentil for easily building Tw…☆16Feb 9, 2018Updated 8 years ago
- BaseX Distribution Files☆21Oct 22, 2025Updated 4 months ago
- Search engine base (crawler, indexer and parser) using Python, Celery, RabbitMQ, CouchDB and Whoosh.☆10Jun 10, 2025Updated 9 months ago
- A Python wrapper for Tesseract and Cuneiform -- Moved to Gnome's Gitlab☆931Jun 13, 2018Updated 7 years ago
- Image comparison QA tool for digital preservation workflows.☆14Nov 17, 2014Updated 11 years ago
- Python: Bad Ideas☆11May 29, 2017Updated 8 years ago
- Python PDF Parser (Not actively maintained). Check out pdfminer.six.☆5,303Dec 7, 2022Updated 3 years ago
- Document Imaging Archive System. Home document imaging, with OCR. Scan documents (with SANE) or import ODF documents, assign tags. Use op…☆25Jul 5, 2015Updated 10 years ago
- Divmod Axiom is an object database, or alternatively, an object-relational mapper, implemented on top of Python.☆25Jan 13, 2023Updated 3 years ago
- Documentation for Project Electron☆14Dec 2, 2024Updated last year
- An unofficial mirror of the pdftk source code.☆21Dec 20, 2016Updated 9 years ago
- Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.☆1,650Mar 7, 2026Updated last week
- Python wrapper for xpdf☆19Nov 28, 2019Updated 6 years ago
- Tools to mirror container images from docker hub☆14Apr 27, 2017Updated 8 years ago
- A python client for taskd☆18Dec 8, 2022Updated 3 years ago
- Docker Container for verdaccio with gitlab authentication☆15Jan 16, 2018Updated 8 years ago
- Goobi workflow - Workflow management software for digitisation projects used in more than 80 cultural heritage institutions in at least 1…☆63Mar 11, 2026Updated last week
- Convertir d'anciens téléphones à cadran en boites à histoires☆23Dec 7, 2025Updated 3 months ago
- CodFS: An Erasure-Coded Clustered Storage System for Efficient Updates and Recovery☆10Mar 31, 2015Updated 10 years ago
- A simple viewer and inspection tool for text boxes in PDF documents☆96Mar 7, 2022Updated 4 years ago
- A zero-shot captcha solver.☆16Dec 22, 2023Updated 2 years ago
- A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files☆9,864Mar 11, 2026Updated last week
- HOCR manipulation and utility library; provides hocr2pdf binary.☆14Mar 5, 2018Updated 8 years ago
- CTE: Contextualized Table Extraction Dataset☆17Feb 23, 2023Updated 3 years ago
- ☆13Nov 11, 2025Updated 4 months ago
- Invenio-Records is a metadata storage module.☆12Jan 27, 2026Updated last month
- A procedure for creating a Cisco Nexus 9000v Switch Vagrant box for the libvirt provider.☆13Updated this week