IgorMeloS / OCRLinks
Image pre-processing and OCR techniques with OpenCV and PyTesseract
☆29Updated 3 years ago
Alternatives and similar repositories for OCR
Users that are interested in OCR are comparing it to the libraries listed below
Sorting:
- ICIP 2022: Adaptive Radial Projection on Fourier Magnitude Spectrum for Document Image Skew Estimation☆154Updated 8 months ago
- OnnxTR a docTR (Document Text Recognition) library Onnx pipeline wrapper - for seamless, high-performing & accessible OCR☆171Updated this week
- UniTable: Towards a Unified Table Foundation Model☆521Updated last year
- Checkbox Detection Model for Scanned Documents☆91Updated 11 months ago
- DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis☆411Updated 3 years ago
- Powerful handwritten text recognition. A simple-to-use, unofficial implementation of the paper "TrOCR: Transformer-based Optical Characte…☆241Updated last year
- Library used to deskew a scanned document☆498Updated this week
- Document Layout Analysis☆395Updated this week
- ☆392Updated 2 years ago
- Object Detection Model for Scanned Documents☆94Updated 11 months ago
- Python library to extract tabular data from images and scanned PDFs☆285Updated last year
- Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.☆202Updated 11 months ago
- Docscan is a document scanner. Take a photo of your documents and frame it.☆106Updated last year
- DocLLM: A layout-aware generative language model for multimodal document understanding☆137Updated 2 years ago
- Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition☆282Updated 3 years ago
- TableNet: Deep Learning model for end-to-end Table Detection and Tabular data extraction from Scanned Data Images In modern times, more a…☆63Updated 3 years ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆80Updated this week
- DocILE: Document Information Localization and Extraction Benchmark☆139Updated last year
- This project aims to extract text from PDF files using the outputs generated by the pdf-document-layout-analysis service. By leveraging t…☆37Updated last year
- The scripts for training Detectron2-based Layout Models on popular layout analysis datasets☆218Updated 2 years ago
- Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and te…☆345Updated 2 years ago
- ☆201Updated last week
- Document image dewarping library using a cubic sheet model☆197Updated last week
- Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understan…☆360Updated 3 years ago
- Run tesseract with the tesserocr bindings with @OCR-D's interfaces☆39Updated 9 months ago
- Document Image Binarization☆79Updated last year
- Recognition of handwritten text using CRAFT text detection and TrOCR☆26Updated 3 years ago
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆115Updated last year
- Repository for deepdoctection tutorial notebooks☆50Updated last month
- ShabbyPages is a state-of-the-art corpus of born-digital document images with both ground truth and distorted versions appropriate for us…☆62Updated 10 months ago