Python script to do PDF OCR conversion using Tesseract
☆371Jun 2, 2023Updated 3 years ago
Alternatives and similar repositories for pypdfocr
Users that are interested in pypdfocr are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- (Python) Execute tesseract OCR on a multi-page PDF.☆19Jun 30, 2023Updated 3 years ago
- A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!☆303May 24, 2026Updated last month
- A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.☆2,256Jun 24, 2022Updated 4 years ago
- Web-based page layout editor created for EMOP (Early Modern OCR Project).☆11May 21, 2021Updated 5 years ago
- Simple tape imaging and extraction tool☆29Jan 31, 2020Updated 6 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- gathering point for open source OCR scripts and diffs☆43Jun 27, 2014Updated 12 years ago
- A package of code for quickly and easily annotating videos in a web browser☆22Apr 17, 2012Updated 14 years ago
- This is about my idea of Knowledge-Graph-Building-Blocks as building blocks for knowledge graph applications.☆16Nov 23, 2022Updated 3 years ago
- A small collection of AWS utilities, packaged as a single standalone binary.☆13Aug 23, 2023Updated 2 years ago
- The computer language for describing species and phenotypes☆15May 18, 2026Updated last month
- Implementing semantically rich NeXML I/O in R☆15May 6, 2024Updated 2 years ago
- A LDP Implementation backed by BlazeGraph☆26Oct 31, 2017Updated 8 years ago
- ☆10Jul 15, 2019Updated 6 years ago
- Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.☆1,279Dec 1, 2020Updated 5 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Audiovisual Core☆15Jun 17, 2026Updated 2 weeks ago
- Search engine base (crawler, indexer and parser) using Python, Celery, RabbitMQ, CouchDB and Whoosh.☆10Jun 10, 2025Updated last year
- Scan, index, and archive all of your paper documents☆7,916Apr 6, 2021Updated 5 years ago
- Python bindings to the Tesseract API☆66Jul 5, 2016Updated 9 years ago
- Models, vocabularies and behaviours for Hyrax applications.☆11Sep 21, 2023Updated 2 years ago
- Python: Bad Ideas☆11May 29, 2017Updated 9 years ago
- Python PDF Parser (Not actively maintained). Check out pdfminer.six.☆5,286Dec 7, 2022Updated 3 years ago
- Harvard University Library Cloud API☆11Feb 25, 2022Updated 4 years ago
- Divmod Axiom is an object database, or alternatively, an object-relational mapper, implemented on top of Python.☆25Jan 13, 2023Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Euler is an open source logic toolkit for aligning taxonomies and visualizing the results; see http://sysbio.oxfordjournals.org/cgi/repri…☆21Jun 14, 2023Updated 3 years ago
- Codes for classifying to focal mechanisms of earthquakes.☆15Feb 9, 2021Updated 5 years ago
- A middleware for RPC and Pub/Sub communication styles☆23Jun 22, 2026Updated last week
- 8-bit raspberry pi game☆14Jan 19, 2017Updated 9 years ago
- 💬 A small event handling library on top of the Slack RTM API.☆15Jan 12, 2020Updated 6 years ago
- A python client for taskd☆18Dec 8, 2022Updated 3 years ago
- Goobi workflow - Workflow management software for digitisation projects used in more than 80 cultural heritage institutions in at least 1…☆64Updated this week
- Dot notation object for Python☆11Apr 13, 2026Updated 2 months ago
- CodFS: An Erasure-Coded Clustered Storage System for Efficient Updates and Recovery☆10Mar 31, 2015Updated 11 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A simple viewer and inspection tool for text boxes in PDF documents☆96Mar 7, 2022Updated 4 years ago
- A zero-shot captcha solver.☆16Dec 22, 2023Updated 2 years ago
- rdiv!(::AbstractMatrix, ::UpperTriangular) and ldiv!(::LowerTriangular, ::AbstractMatrix)☆11Nov 18, 2024Updated last year
- CI scripts for validating and processing metadata☆11Dec 7, 2019Updated 6 years ago
- django CMS Icon adds capabilities to implement Font or SVG icons as plugins into your project.☆19May 4, 2026Updated 2 months ago
- This software (prototype) extracts values of Excel spreadsheet properties and calculates a tentative spreadsheet complexity assessment ba…☆13May 15, 2026Updated last month
- Trusty URI specification☆22Feb 23, 2015Updated 11 years ago