Franky1 / Tesseract-OCR-5-Docker
Docker Image with latest Tesseract OCR Version 5.x.x built from sources
☆30Updated last week
Related projects ⓘ
Alternatives and complementary repositories for Tesseract-OCR-5-Docker
- OCRmyPDF EasyOCR plugin☆53Updated 2 months ago
- Python binding to Poppler-cpp pdf library☆98Updated 2 months ago
- Run OCR, extract information from documents and classify them. In addition, annotate documents and build custom NLP and computer vision m…☆62Updated last week
- ICIP 2022: Adaptive Radial Projection on Fourier Magnitude Spectrum for Document Image Skew Estimation☆127Updated last week
- Webinterface for administrating Ollama and model Quantization with public endpoints and automized OPENAI proxy☆51Updated 6 months ago
- Record audio and save a transcription to your system's clipboard with ctranslate2 and faster-whisper.☆69Updated last month
- Doc2Graph transforms documents into graphs and exploit a GNN to solve several tasks.☆116Updated last year
- OCR using Python, Tesseract and OpenCV in a Docker container☆123Updated last year
- Extract structured data from local or remote LLM models☆36Updated 5 months ago
- Convert a PDF via OCR to a TXT file in UTF-8 encoding☆141Updated last year
- Object Detection Model for Scanned Documents☆83Updated last year
- A PyTorch implementation of DTrOCR: Decoder-only Transformer for Optical Character Recognition☆92Updated 3 months ago
- Access the Cohere Command R family of models☆32Updated 7 months ago
- A Python asyncio wrapper for Tesseract-OCR.☆22Updated 3 weeks ago
- This repository contains code for line detection, character detection and recognition on the cuneiform 2d images☆30Updated 5 years ago
- web based editor for subtitles and transcripts☆112Updated 3 months ago
- Python-tesseract is an optical character recognition (OCR) tool for python☆87Updated 6 years ago
- Library used to deskew a scanned document☆418Updated last month
- Toolkit for training/converting LibreTranslate compatible language models 🚂☆48Updated 3 weeks ago
- Indox is an advanced search and retrieval technique that efficiently extracts data from diverse document types, including PDFs and HTML, …☆16Updated 3 weeks ago
- Speak (speech-to-text) to Ollama LLMs in any lanaguage - Streamlit app☆37Updated 8 months ago
- ShabbyPages is a state-of-the-art corpus of born-digital document images with both ground truth and distorted versions appropriate for us…☆51Updated this week
- A cost estimator for OpenAI API calls in tqdm loops.☆13Updated 7 months ago
- A performant high-throughput CPU-based API for Meta's No Language Left Behind (NLLB) using CTranslate2, hosted on Hugging Face Spaces.☆95Updated this week
- Fast and memory-efficient Python PDF Parser based on xpdf sources☆40Updated 11 months ago
- Demos of ChatGPT's function calling/structured data support.☆22Updated 11 months ago
- Document image dewarping library using a cubic sheet model☆117Updated this week
- Tutorial on how to deskew (straighten) text images☆50Updated 2 years ago
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆91Updated 2 months ago