Franky1 / Tesseract-OCR-5-DockerLinks
Docker Image with latest Tesseract OCR Version 5.x.x built from sources
☆39Updated 3 weeks ago
Alternatives and similar repositories for Tesseract-OCR-5-Docker
Users that are interested in Tesseract-OCR-5-Docker are comparing it to the libraries listed below
Sorting:
- OCRmyPDF EasyOCR plugin☆86Updated 2 months ago
- OCR using Python, Tesseract and OpenCV in a Docker container☆124Updated 2 years ago
- Document image dewarping library using a cubic sheet model☆160Updated this week
- ICIP 2022: Adaptive Radial Projection on Fourier Magnitude Spectrum for Document Image Skew Estimation☆141Updated last month
- Document Image Binarization☆77Updated 8 months ago
- Library used to deskew a scanned document☆470Updated this week
- A post-processing tool for scanned sheets of paper.☆82Updated last year
- Object Detection Model for Scanned Documents☆93Updated 3 months ago
- Python binding to Poppler-cpp pdf library☆110Updated 9 months ago
- Train Tesseract LSTM with GUI on Windows☆39Updated last year
- ShabbyPages is a state-of-the-art corpus of born-digital document images with both ground truth and distorted versions appropriate for us…☆58Updated 3 months ago
- Checkbox Detection Model for Scanned Documents☆76Updated 3 months ago
- Python interface to Libcpdf☆11Updated 10 months ago
- Blazing fast fuzzy text search for Python.☆44Updated 2 months ago
- OnnxTR a docTR (Document Text Recognition) library Onnx pipeline wrapper - for seamless, high-performing & accessible OCR☆127Updated this week
- Recognition of handwritten text using CRAFT text detection and TrOCR☆26Updated 2 years ago
- Simple Pytorch framework to train OCRs. Supports CRNNs, Attention, CTC and Cross Entropy Loss.☆81Updated last year
- Sentence Transformers API: An OpenAI compatible embedding API server☆60Updated 9 months ago
- Powerful handwritten text recognition. A simple-to-use, unofficial implementation of the paper "TrOCR: Transformer-based Optical Characte…☆206Updated 5 months ago
- A Python asyncio wrapper for Tesseract-OCR.☆26Updated 8 months ago
- DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis☆349Updated 2 years ago
- mrkdwn_analysis is a Python library for analyzing Markdown files. It extracts and categorizes Markdown elements like headers, sections, l…☆39Updated 2 months ago
- YOLOv11 trained on DocLayNet dataset.☆44Updated 7 months ago
- OCR engine for all the languages☆841Updated last week
- Web interface for recognizing text, proofreading OCR, and creating fully-digitized documents.☆190Updated this week
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆70Updated last week
- A library to extract the main content from html. Developed for information on LLM and for feeding data into LangChain and LlamaIndex.☆42Updated last year
- Python API for PDF documents☆122Updated 9 months ago
- ☆40Updated 4 years ago
- Simple package to extract text with coordinates from programmatic PDFs☆133Updated this week