mindee / doctr
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
☆4,486Updated this week
Alternatives and similar repositories for doctr:
Users that are interested in doctr are comparing it to the libraries listed below
- A Repo For Document AI☆2,764Updated this week
- ☆939Updated 6 months ago
- An easy way to extract information from documents☆1,744Updated last year
- A Unified Toolkit for Deep Learning Based Document Image Analysis☆5,143Updated 7 months ago
- A curated list of resources for Document Understanding (DU) topic☆1,387Updated last year
- OCR engine for all the languages☆796Updated this week
- Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the o…☆2,543Updated 9 months ago
- Mindee API Helper Library for Python☆40Updated this week
- Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022☆6,122Updated 8 months ago
- Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes☆399Updated last month
- This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table …☆1,516Updated 3 years ago
- Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and …☆26,111Updated 6 months ago
- PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.☆6,783Updated last week
- A synthetic data generator for text recognition☆3,436Updated 8 months ago
- OCR, layout analysis, reading order, table recognition in 90+ languages☆16,979Updated this week
- Links to awesome OCR projects☆2,938Updated 8 months ago
- Mindee API Helper Library for Node.js☆25Updated last week
- Deep neural network to extract intelligent information from invoice documents.☆2,579Updated 10 months ago
- A Python library to extract tabular data from PDFs☆3,222Updated this week
- Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XM…☆4,069Updated 2 weeks ago
- Line based ATR Engine based on OCRopy☆1,127Updated 2 weeks ago
- Improved file parsing for LLM’s☆2,877Updated 4 months ago
- Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets☆4,397Updated this week
- Transforms PDF, Documents and Images into Enriched Structured Data☆5,941Updated last year
- A packaged and flexible version of the CRAFT text detector and Keras CRNN recognition model.☆1,444Updated 7 months ago
- img2table is a table identification and extraction Python Library for PDF and images, based on OpenCV image processing☆688Updated last month
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆3,805Updated 2 months ago
- Library used to deskew a scanned document☆447Updated this week
- Data processing with ML, LLM and Vision LLM☆4,431Updated this week
- CORD: A Consolidated Receipt Dataset for Post-OCR Parsing☆417Updated 2 years ago