IgorMeloS / OCR
Image pre-processing and OCR techniques with OpenCV and PyTesseract
☆21Updated 3 years ago
Alternatives and similar repositories for OCR
Users that are interested in OCR are comparing it to the libraries listed below
Sorting:
- Repository for deepdoctection tutorial notebooks☆45Updated 5 months ago
- A repository for creating, and sample code for consuming an ONNX embedding model☆30Updated last year
- DocLLM: A layout-aware generative language model for multimodal document understanding☆126Updated last year
- Pipeline for converting PDFs to raw text with PaddleOCR☆23Updated last year
- Handwritten text detection in document images using Detectron2☆20Updated 3 years ago
- Nougat is a Meta AI's revolutionary OCR model designed to transcribe scientific PDFs into an easy-to-use Markdown format.☆22Updated last year
- ☆180Updated last month
- Build document-native LLM applications☆53Updated 8 months ago
- DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis☆340Updated 2 years ago
- ☆122Updated 2 months ago
- ☆22Updated last year
- ☆115Updated last week
- Document Layout Analysis☆373Updated this week
- Run embedding models using ONNX☆33Updated last year
- Logical structure analysis for visually structured documents☆89Updated 2 years ago
- TableNet: Deep Learning model for end-to-end Table Detection and Tabular data extraction from Scanned Data Images In modern times, more a…☆58Updated 2 years ago
- A CLI tool for managing OpenAI batch processing jobs with ease.☆35Updated 2 weeks ago
- Data extraction with Donut ML model☆57Updated 9 months ago
- Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.☆27Updated 2 years ago
- Document Image Binarization☆78Updated 7 months ago
- ShabbyPages is a state-of-the-art corpus of born-digital document images with both ground truth and distorted versions appropriate for us…☆58Updated 2 months ago
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆36Updated 2 months ago
- A Streamlit component integrating Label Studio Frontend in Streamlit applications☆72Updated 10 months ago
- GLiNER model in a FastAPI microservice.☆44Updated 5 months ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆69Updated last month
- ICIP 2022: Adaptive Radial Projection on Fourier Magnitude Spectrum for Document Image Skew Estimation☆136Updated this week
- ☆49Updated 10 months ago
- Apply different text recognition services to images of handwritten documents.☆178Updated 2 years ago
- ☆80Updated 3 years ago
- Adobe PDFServices python SDK Samples☆150Updated this week