virantha/pypdfocr

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/virantha/pypdfocr)

virantha / pypdfocr

Python script to do PDF OCR conversion using Tesseract

☆371

Alternatives and similar repositories for pypdfocr

Users that are interested in pypdfocr are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

qedsoftware / multipage-ocr
View on GitHub
(Python) Execute tesseract OCR on a multi-page PDF.
☆19Jun 30, 2023Updated 3 years ago
LeoFCardoso / pdf2pdfocr
View on GitHub
A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!
☆303May 24, 2026Updated last month
WZBSocialScienceCenter / pdftabextract
View on GitHub
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
☆2,256Jun 24, 2022Updated 4 years ago
PRImA-Research-Lab / prima-aletheia-web-emop
View on GitHub
Web-based page layout editor created for EMOP (Early Modern OCR Project).
☆11May 21, 2021Updated 5 years ago
KBNLresearch / tapeimgr
View on GitHub
Simple tape imaging and extraction tool
☆29Jan 31, 2020Updated 6 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
artunit / ossocr
View on GitHub
gathering point for open source OCR scripts and diffs
☆43Jun 27, 2014Updated 12 years ago
thomashuston / HTML5-Annotation-Tool
View on GitHub
A package of code for quickly and easily annotating videos in a web browser
☆22Apr 17, 2012Updated 14 years ago
LarsVogt / Knowledge-Graph-Building-Blocks
View on GitHub
This is about my idea of Knowledge-Graph-Building-Blocks as building blocks for knowledge graph applications.
☆16Nov 23, 2022Updated 3 years ago
skx / aws-utils
View on GitHub
A small collection of AWS utilities, packaged as a single standalone binary.
☆13Aug 23, 2023Updated 2 years ago
sergeitarasov / PhenoScript
View on GitHub
The computer language for describing species and phenotypes
☆15May 18, 2026Updated last month
ropensci / RNeXML
View on GitHub
Implementing semantically rich NeXML I/O in R
☆15May 6, 2024Updated 2 years ago
cavendish-ldp / cavendish
View on GitHub
A LDP Implementation backed by BlazeGraph
☆26Oct 31, 2017Updated 8 years ago
groutr / conda-tools
View on GitHub
☆10Jul 15, 2019Updated 6 years ago
jlsutherland / doc2text
View on GitHub
Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.
☆1,279Dec 1, 2020Updated 5 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
tdwg / ac
View on GitHub
Audiovisual Core
☆15Jun 17, 2026Updated 2 weeks ago
ncouture / python-search-engine
View on GitHub
Search engine base (crawler, indexer and parser) using Python, Celery, RabbitMQ, CouchDB and Whoosh.
☆10Jun 10, 2025Updated last year
the-paperless-project / paperless
View on GitHub
Scan, index, and archive all of your paper documents
☆7,916Apr 6, 2021Updated 5 years ago
gregjurman / tesserwrap
View on GitHub
Python bindings to the Tesseract API
☆66Jul 5, 2016Updated 9 years ago
samvera-labs / dog_biscuits
View on GitHub
Models, vocabularies and behaviours for Hyrax applications.
☆11Sep 21, 2023Updated 2 years ago
seperman / bad-ideas
View on GitHub
Python: Bad Ideas
☆11May 29, 2017Updated 9 years ago
euske / pdfminer
View on GitHub
Python PDF Parser (Not actively maintained). Check out pdfminer.six.
☆5,286Dec 7, 2022Updated 3 years ago
harvard-library / librarycloud
View on GitHub
Harvard University Library Cloud API
☆11Feb 25, 2022Updated 4 years ago
twisted / axiom
View on GitHub
Divmod Axiom is an object database, or alternatively, an object-relational mapper, implemented on top of Python.
☆25Jan 13, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
EulerProject / EulerX
View on GitHub
Euler is an open source logic toolkit for aligning taxonomies and visualizing the results; see http://sysbio.oxfordjournals.org/cgi/repri…
☆21Jun 14, 2023Updated 3 years ago
blasscoc / FocalMechClassifier
View on GitHub
Codes for classifying to focal mechanisms of earthquakes.
☆15Feb 9, 2021Updated 5 years ago
cardo-org / Rembus.jl
View on GitHub
A middleware for RPC and Pub/Sub communication styles
☆23Jun 22, 2026Updated last week
lucasbrambrink / snake_8x8_dotmatrix
View on GitHub
8-bit raspberry pi game
☆14Jan 19, 2017Updated 9 years ago
reillysiemens / layabout
View on GitHub
💬 A small event handling library on top of the Slack RTM API.
☆15Jan 12, 2020Updated 6 years ago
jrabbit / taskd-client-py
View on GitHub
A python client for taskd
☆18Dec 8, 2022Updated 3 years ago
intranda / goobi-workflow
View on GitHub
Goobi workflow - Workflow management software for digitisation projects used in more than 80 cultural heritage institutions in at least 1…
☆64Updated this week
seperman / dotobject
View on GitHub
Dot notation object for Python
☆11Apr 13, 2026Updated 2 months ago
patrickpclee / codfs
View on GitHub
CodFS: An Erasure-Coded Clustered Storage System for Efficient Updates and Recovery
☆10Mar 31, 2015Updated 11 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
WZBSocialScienceCenter / pdf2xml-viewer
View on GitHub
A simple viewer and inspection tool for text boxes in PDF documents
☆96Mar 7, 2022Updated 4 years ago
LudwigStumpp / zero-shot-captcha-solver
View on GitHub
A zero-shot captcha solver.
☆16Dec 22, 2023Updated 2 years ago
JuliaSIMD / TriangularSolve.jl
View on GitHub
rdiv!(::AbstractMatrix, ::UpperTriangular) and ldiv!(::LowerTriangular, ::AbstractMatrix)
☆11Nov 18, 2024Updated last year
library-ucsb / metadata-ci
View on GitHub
CI scripts for validating and processing metadata
☆11Dec 7, 2019Updated 6 years ago
django-cms / djangocms-icon
View on GitHub
django CMS Icon adds capabilities to implement Font or SVG icons as plugins into your project.
☆19May 4, 2026Updated 2 months ago
RvanVeenendaal / Spreadsheet-Complexity-Analyser
View on GitHub
This software (prototype) extracts values of Excel spreadsheet properties and calculates a tentative spreadsheet complexity assessment ba…
☆13May 15, 2026Updated last month
trustyuri / trustyuri-spec
View on GitHub
Trusty URI specification
☆22Feb 23, 2015Updated 11 years ago