Franky1 / Tesseract-OCR-5-Docker
Docker Image with latest Tesseract OCR Version 5.x.x built from sources
☆37Updated 2 weeks ago
Alternatives and similar repositories for Tesseract-OCR-5-Docker
Users that are interested in Tesseract-OCR-5-Docker are comparing it to the libraries listed below
Sorting:
- OCRmyPDF EasyOCR plugin☆84Updated last month
- mrkdwn_analysis is a Python library for analyzing Markdown files. It extracts and categorizes Markdown elements like headers, sections, l…☆37Updated last month
- OnnxTR a docTR (Document Text Recognition) library Onnx pipeline wrapper - for seamless, high-performing & accessible OCR☆109Updated this week
- A curated list of resources around PDF files☆131Updated 9 months ago
- By default FastAPI uses CDN for swagger ui assets, with this repository you can use it offline.☆17Updated last year
- Python-tesseract is an optical character recognition (OCR) tool for python☆138Updated 6 years ago
- A line-based framework to detect and extract tabular data in JSON format from raster images using computer vision and Tesseract OCR.☆57Updated last year
- Fast and memory-efficient Python PDF Parser based on xpdf sources☆42Updated last year
- Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular…☆70Updated last month
- Library used to deskew a scanned document☆461Updated 2 weeks ago
- ICIP 2022: Adaptive Radial Projection on Fourier Magnitude Spectrum for Document Image Skew Estimation☆136Updated this week
- Object Detection Model for Scanned Documents☆93Updated 2 months ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆69Updated last month
- Detect and read handwritten words on scanned pages.☆119Updated last year
- Python package undouble is to detect (near-)identical images.☆50Updated 3 weeks ago
- 🚂 Fine-tune OpenAI models for text classification, question answering, and more☆16Updated 2 years ago
- Data extraction with Donut ML model☆57Updated 9 months ago
- Demos of ChatGPT's function calling/structured data support.☆24Updated last year
- Python API for PDF documents☆121Updated 8 months ago
- Document image dewarping library using a cubic sheet model☆155Updated this week
- LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for cloud deployment.☆51Updated 7 months ago
- Parallel and LAzY Analyzer for PDFs 🏖️☆27Updated this week
- OCR using Python, Tesseract and OpenCV in a Docker container☆124Updated 2 years ago
- Image pre-processing and OCR techniques with OpenCV and PyTesseract☆21Updated 3 years ago
- Pipeline for converting PDFs to raw text with PaddleOCR☆23Updated last year
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆391Updated 9 months ago
- Logical structure analysis for visually structured documents☆89Updated 2 years ago
- Simple Pytorch framework to train OCRs. Supports CRNNs, Attention, CTC and Cross Entropy Loss.☆80Updated last year
- ☆34Updated 4 years ago
- Tutorial on how to deskew (straighten) text images☆51Updated 3 years ago