Python script to do PDF OCR conversion using Tesseract
☆371Jun 2, 2023Updated 3 years ago
Alternatives and similar repositories for pypdfocr
Users that are interested in pypdfocr are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- (Python) Execute tesseract OCR on a multi-page PDF.☆19Jun 30, 2023Updated 2 years ago
- A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!☆303May 24, 2026Updated 3 weeks ago
- Node.js implementation of the PirateBox Server inspired by David Darts☆15Dec 8, 2015Updated 10 years ago
- A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.☆2,257Jun 24, 2022Updated 3 years ago
- Extract tables from scanned image PDFs using Optical Character Recognition.☆277Jun 9, 2020Updated 6 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- My awesome dotfiles.☆25Sep 15, 2021Updated 4 years ago
- Physical unit systems (Metric, English, Natural, Planck, etc...)☆20Sep 24, 2025Updated 8 months ago
- Simple tape imaging and extraction tool☆29Jan 31, 2020Updated 6 years ago
- A package of code for quickly and easily annotating videos in a web browser☆22Apr 17, 2012Updated 14 years ago
- This is about my idea of Knowledge-Graph-Building-Blocks as building blocks for knowledge graph applications.☆15Nov 23, 2022Updated 3 years ago
- The computer language for describing species and phenotypes☆15May 18, 2026Updated 3 weeks ago
- Implementing semantically rich NeXML I/O in R☆15May 6, 2024Updated 2 years ago
- A LDP Implementation backed by BlazeGraph☆26Oct 31, 2017Updated 8 years ago
- Optical Character Recognition in Python.☆44Dec 30, 2018Updated 7 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- 《香港二十世紀中期粵語語料庫》打包器☆16Apr 12, 2016Updated 10 years ago
- Decentralizing exports in Julia☆15Jul 29, 2020Updated 5 years ago
- Models, vocabularies and behaviours for Hyrax applications.☆11Sep 21, 2023Updated 2 years ago
- Data Generator for Training Tesseract OCR☆10Jul 7, 2020Updated 5 years ago
- Python PDF Parser (Not actively maintained). Check out pdfminer.six.☆5,291Dec 7, 2022Updated 3 years ago
- Document Imaging Archive System. Home document imaging, with OCR. Scan documents (with SANE) or import ODF documents, assign tags. Use op…☆25Jul 5, 2015Updated 10 years ago
- Harvard University Library Cloud API☆11Feb 25, 2022Updated 4 years ago
- Divmod Axiom is an object database, or alternatively, an object-relational mapper, implemented on top of Python.☆25Jan 13, 2023Updated 3 years ago
- An unofficial mirror of the pdftk source code.☆21Dec 20, 2016Updated 9 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- 8-bit raspberry pi game☆14Jan 19, 2017Updated 9 years ago
- An sshfs profile management script written in bash☆25Nov 23, 2022Updated 3 years ago
- 💬 A small event handling library on top of the Slack RTM API.☆15Jan 12, 2020Updated 6 years ago
- Goobi workflow - Workflow management software for digitisation projects used in more than 80 cultural heritage institutions in at least 1…☆64Updated this week
- A simple viewer and inspection tool for text boxes in PDF documents☆96Mar 7, 2022Updated 4 years ago
- CI scripts for validating and processing metadata☆11Dec 7, 2019Updated 6 years ago
- rdiv!(::AbstractMatrix, ::UpperTriangular) and ldiv!(::LowerTriangular, ::AbstractMatrix)☆12Nov 18, 2024Updated last year
- django CMS Icon adds capabilities to implement Font or SVG icons as plugins into your project.☆19May 4, 2026Updated last month
- This software (prototype) extracts values of Excel spreadsheet properties and calculates a tentative spreadsheet complexity assessment ba…☆13May 15, 2026Updated 3 weeks ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files☆10,036Updated this week
- Field Service management System☆24Dec 22, 2024Updated last year
- Invenio-Records is a metadata storage module.☆12May 28, 2026Updated 2 weeks ago
- Command-line tool for building Gephi force-directed graph diagrams.☆10Nov 10, 2017Updated 8 years ago
- Markdown for Linked Data☆17Apr 4, 2015Updated 11 years ago
- Better passwords by combining random words.☆13Oct 24, 2020Updated 5 years ago
- A CLI for the API of LinkAce (https://github.com/Kovah/LinkAce)☆11Jan 2, 2025Updated last year