PedroBarcha / old-books-dataset
Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.
☆12Updated 7 years ago
Alternatives and similar repositories for old-books-dataset:
Users that are interested in old-books-dataset are comparing it to the libraries listed below
- ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction☆27Updated 6 years ago
- ☆9Updated 5 years ago
- DL models that take a document image file as input, locate the position of paragraphs, lines, images, etc. with their labels and confiden…☆26Updated 4 years ago
- Key Information Extraction From Documents: Evaluation And Generator☆20Updated 4 years ago
- ☆15Updated 4 years ago
- Text and Layout Document Image Understanding. LayoutLM☆23Updated 3 years ago
- Repo to host the forms dataset☆15Updated 4 years ago
- ☆22Updated 4 years ago
- An implementation of Tiling and Corruption (TACo) Augmentations for OCR/HTR☆15Updated 3 years ago
- Detect textlines in document images☆92Updated 10 months ago
- ☆22Updated 2 years ago
- A Dense Text Detection model using Receptive Field Blocks☆31Updated 2 years ago
- Code for the ICDAR2021 paper "Visual FUDGE: Form Understanding via Dynamic Graph Editing"☆33Updated 3 years ago
- ☆23Updated 6 months ago
- DFKI Layout Detection for OCR-D☆47Updated 2 weeks ago
- Python tools for Tesseract OCR training☆25Updated 2 years ago
- Sample implementation of OCR metrics (CER, WER) calculation with TesseractOCR and fastwer☆28Updated 3 years ago
- Repository of the back end implementation of DivaServices☆14Updated 5 years ago
- ☆25Updated 5 years ago
- Close-Domain fine-tuning for table detection☆72Updated 2 years ago
- OCR-D-compliant page segmentation☆67Updated last month
- Handwritten text recognition using transformers.☆157Updated 9 months ago
- An OCR system using CRAFT for text detection and MORAN for recognition☆19Updated this week
- OCR & Ground Truth Resources☆75Updated 2 years ago
- ☆57Updated 3 years ago
- Dense Article Dataset (DAD): A Benchmark Dataset for Document Layout Analysis☆15Updated 3 years ago
- ☆127Updated 5 years ago
- Evaluation of the Layoutlm model on the CORD dataset☆32Updated 3 years ago
- ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction☆32Updated 2 years ago
- ☆79Updated 3 years ago