PedroBarcha / old-books-dataset
Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.
☆12Updated 7 years ago
Alternatives and similar repositories for old-books-dataset:
Users that are interested in old-books-dataset are comparing it to the libraries listed below
- Key Information Extraction From Documents: Evaluation And Generator☆20Updated 4 years ago
- ☆9Updated 5 years ago
- ☆22Updated 2 years ago
- ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction☆32Updated 2 years ago
- DIAR software for synthetic document image and groundtruth generation, with various degradation models for data augmentation☆119Updated last year
- An implementation of Tiling and Corruption (TACo) Augmentations for OCR/HTR☆15Updated 3 years ago
- Code for the ICDAR2021 paper "Visual FUDGE: Form Understanding via Dynamic Graph Editing"☆33Updated 3 years ago
- OCR & Ground Truth Resources☆75Updated 3 years ago
- Pretrained mixed models to be used with Calamari.☆62Updated 7 months ago
- ☆69Updated 7 years ago
- TextTron is a simple light-weight image processing based text detector for document images.☆52Updated 4 years ago
- Document Image Binarization☆78Updated 6 months ago
- Text and Layout Document Image Understanding. LayoutLM☆23Updated 3 years ago
- CVPR 2022: Table Structure Recognition☆39Updated 3 years ago
- Pytorch Implementation of TableNet☆65Updated 3 years ago
- Detect textlines in document images☆93Updated 11 months ago
- Pytorch implementation of our paper: Adapting OCR with Limited Labels☆60Updated last year
- Using FCN to segment the book's content and background, then dewarping the pages,☆19Updated 3 years ago
- ☆34Updated 4 years ago
- Extraction of meaningful instances from document images with a Chargrid model☆34Updated 3 years ago
- Document Visual Question Answering☆116Updated 4 years ago
- A Dense Text Detection model using Receptive Field Blocks☆31Updated 2 years ago
- TensorFlow implementation of a segmentation system for document images.☆34Updated 6 years ago
- ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction☆27Updated 6 years ago
- A Unet based deeplearning model to line/box/spurious artifacts from text images. Unsupervised training.☆59Updated 5 years ago
- TableNet Implementation on Pytorch☆147Updated 2 years ago
- ☆127Updated 5 years ago
- Publicly released code for the LAMBERT model☆103Updated 3 years ago
- ☆138Updated last year
- ShabbyPages is a state-of-the-art corpus of born-digital document images with both ground truth and distorted versions appropriate for us…☆57Updated last month