virantha / pypdfocr
Python script to do PDF OCR conversion using Tesseract
☆373Updated last year
Alternatives and similar repositories for pypdfocr:
Users that are interested in pypdfocr are comparing it to the libraries listed below
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆375Updated 5 months ago
- Extract tables from PDF pages.☆283Updated 4 years ago
- The simplest way to extract text from PDFs in Python☆428Updated 2 years ago
- A post-processing tool for scanned sheets of paper.☆1,055Updated 6 months ago
- A more complete example of programming with PDFMiner, which continues where the default documentation stops☆214Updated 5 years ago
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆183Updated 3 months ago
- A fast and friendly PDF scraping library.☆774Updated last year
- OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched☆260Updated 8 years ago
- A Python wrapper for Tesseract and Cuneiform -- Moved to Gnome's Gitlab☆931Updated 6 years ago
- Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.☆1,273Updated 4 years ago
- A simple viewer and inspection tool for text boxes in PDF documents☆94Updated 2 years ago
- OCR engine for all the languages☆767Updated this week
- Python module to drive the awesome pdftk binary.☆148Updated last year
- port of PDF fdfgen library for filling in PDF forms to Python☆171Updated 2 months ago
- This is a tutorial on getting OCR running on a simple web server, using python, flask, tesseract-ocr, and leptonica☆258Updated 4 years ago
- Python library to programatically create epub files☆283Updated last year
- Adds text to PDF files using the cuneiform OCR software☆325Updated 3 years ago
- Ocular is a state-of-the-art historical OCR system.☆258Updated 7 months ago
- Python binding to libpoppler with focus on text extraction☆97Updated 3 years ago
- A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.☆2,230Updated 2 years ago
- A Python wrapper for the tesseract-ocr API☆2,042Updated last month
- python app/framework for 'all things ISBN' including metadata, descriptions, covers...☆216Updated last year
- python library to validate, clean, transform and get metadata of ISBN strings (for devs).☆237Updated 5 months ago
- Extract tables from scanned image PDFs using Optical Character Recognition.☆271Updated 4 years ago
- Extensible RSS 2.0 Feed Generator written in Python☆187Updated last year
- Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.☆1,530Updated 9 months ago
- Mapping photos of Old New York☆287Updated last month
- Python E-book library for handling books in EPUB2/EPUB3 format -☆1,534Updated 5 months ago
- Parse human-readable date/time strings☆696Updated last week
- Python wrapper for Pandoc—the universal document converter.☆213Updated 8 years ago