Franky1 / Tesseract-OCR-5-DockerLinks
Docker Image with latest Tesseract OCR Version 5.x.x built from sources
☆47Updated last week
Alternatives and similar repositories for Tesseract-OCR-5-Docker
Users that are interested in Tesseract-OCR-5-Docker are comparing it to the libraries listed below
Sorting:
- OCRmyPDF EasyOCR plugin☆97Updated 4 months ago
- A Python asyncio wrapper for Tesseract-OCR.☆27Updated this week
- A curated list of resources around PDF files☆149Updated last year
- A Python library to extract tabular data from PDFs☆66Updated 9 months ago
- Benchmarking PDF libraries☆320Updated 6 months ago
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆328Updated 2 years ago
- Convenience Docker images for Apache Tika Server☆229Updated 2 weeks ago
- Pipeline for converting PDFs to raw text with PaddleOCR☆23Updated 2 years ago
- Document scanner written in python using OpenCV and other Computer Vision libraries. Scans image of documents and creates scanned version…☆31Updated 11 months ago
- A post-processing tool for scanned sheets of paper.☆85Updated last year
- Streamlit PDF viewer☆191Updated last week
- Library used to deskew a scanned document☆497Updated this week
- Translate HTML using Argos Translate☆57Updated 2 years ago
- ICIP 2022: Adaptive Radial Projection on Fourier Magnitude Spectrum for Document Image Skew Estimation☆154Updated 8 months ago
- Sentence Transformers API: An OpenAI compatible embedding API server☆70Updated last year
- Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.☆201Updated this week
- A python based HTML to text conversion library, command line client and Web service.☆331Updated last month
- A Python tool to help extracting information from structured PDFs.☆427Updated last month
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆80Updated 2 weeks ago
- Powerful handwritten text recognition. A simple-to-use, unofficial implementation of the paper "TrOCR: Transformer-based Optical Characte…☆237Updated last year
- Redis Queue Dashboard based on FastAPI☆121Updated last month
- A task queue based on redis that can serve as a peak shaver and protect your app.☆42Updated 3 years ago
- Python library to extract tabular data from images and scanned PDFs☆285Updated last year
- Document image dewarping library using a cubic sheet model☆194Updated this week
- Python binding to Poppler-cpp pdf library☆114Updated last year
- Markdown to pdf renderer☆126Updated 2 months ago
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆158Updated 3 weeks ago
- Deidentify people's names and gender specific pronouns☆44Updated 8 months ago
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆406Updated last year
- Quickly check whether there is a visible difference between two PDFs.☆72Updated last month