Franky1 / Tesseract-OCR-5-DockerLinks
Docker Image with latest Tesseract OCR Version 5.x.x built from sources
☆41Updated last month
Alternatives and similar repositories for Tesseract-OCR-5-Docker
Users that are interested in Tesseract-OCR-5-Docker are comparing it to the libraries listed below
Sorting:
- OCRmyPDF EasyOCR plugin☆89Updated 4 months ago
- Jupyter Docker stack image with pre-installer scraper tools and libraries☆27Updated 2 years ago
- A post-processing tool for scanned sheets of paper.☆82Updated last year
- Detect and read handwritten words on scanned pages.☆125Updated 2 years ago
- Document image dewarping library using a cubic sheet model☆167Updated this week
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆326Updated last year
- A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!☆295Updated 2 months ago
- Demos, examples and utilities using PyMuPDF☆676Updated last year
- Powerful handwritten text recognition. A simple-to-use, unofficial implementation of the paper "TrOCR: Transformer-based Optical Characte…☆211Updated 7 months ago
- CLI tool to extract (meta)data from PDF and manipulate PDF files☆167Updated this week
- Python bindings to connect to a LibreTranslate API☆118Updated 5 months ago
- A curated list of resources around PDF files☆137Updated last year
- OnnxTR a docTR (Document Text Recognition) library Onnx pipeline wrapper - for seamless, high-performing & accessible OCR☆141Updated this week
- LLM plugin for embeddings using sentence-transformers☆70Updated 3 months ago
- DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis☆365Updated 2 years ago
- OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.☆110Updated 2 years ago
- Efficient OCR engine for receipt image processing using Python, FastAPI, and Tesseract☆110Updated 8 months ago
- Sentence Transformers API: An OpenAI compatible embedding API server☆64Updated 11 months ago
- Recognition of handwritten text using CRAFT text detection and TrOCR☆26Updated 2 years ago
- A Python library to extract tabular data from PDFs☆66Updated 4 months ago
- Object Detection Model for Scanned Documents☆94Updated 5 months ago
- `pdfstructure` detects, splits and organizes the documents text content into its natural structure as envisioned by the author.☆106Updated last year
- Library used to deskew a scanned document☆475Updated 2 weeks ago
- ICIP 2022: Adaptive Radial Projection on Fourier Magnitude Spectrum for Document Image Skew Estimation☆142Updated 3 months ago
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆395Updated last year
- Train Tesseract LSTM with make☆688Updated 3 months ago
- ocr-docker is small, Flask powerd web app, helps us to extract text from images and pdf document using OCR☆60Updated 5 months ago
- Simple package to extract text with coordinates from programmatic PDFs☆166Updated last week
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆70Updated last week
- mrkdwn_analysis is a Python library for analyzing Markdown files. It extracts and categorizes Markdown elements like headers, sections, l…☆41Updated 4 months ago