PedroBarcha / old-books-dataset
Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.
☆12Updated 7 years ago
Related projects ⓘ
Alternatives and complementary repositories for old-books-dataset
- DFKI Layout Detection for OCR-D☆47Updated 2 weeks ago
- An implementation of Tiling and Corruption (TACo) Augmentations for OCR/HTR☆15Updated 2 years ago
- DL models that take a document image file as input, locate the position of paragraphs, lines, images, etc. with their labels and confiden…☆26Updated 3 years ago
- Handwritten text recognition using transformers.☆154Updated 3 months ago
- ☆22Updated 5 years ago
- ☆21Updated last year
- Repository of the back end implementation of DivaServices☆14Updated 5 years ago
- Code for the ICDAR2021 paper "Visual FUDGE: Form Understanding via Dynamic Graph Editing"☆33Updated 2 years ago
- Easter2.0: IMPROVING CONVOLUTIONAL MODELS FOR HANDWRITTEN TEXT RECOGNITION☆77Updated last year
- ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction☆28Updated 5 years ago
- TensorFlow Implementation of FOTS, Fast Oriented Text Spotting with a Unified Network.☆56Updated 3 years ago
- [ICDAR 2023] SelfDocSeg: A self-supervised vision-based approach towards Document Segmentation (Oral)☆38Updated last year
- OCR-D-compliant page segmentation☆67Updated 2 months ago
- OCR & Ground Truth Resources☆74Updated 2 years ago
- Detect textlines in document images☆90Updated 5 months ago
- Pretrained mixed models to be used with Calamari.☆58Updated last month
- A Dense Text Detection model using Receptive Field Blocks☆31Updated 2 years ago
- ☆15Updated 4 years ago
- Pytorch implementation of our paper: Adapting OCR with Limited Labels☆57Updated 10 months ago
- Document Visual Question Answering☆110Updated 4 years ago
- TensorFlow implementation of a segmentation system for document images.☆34Updated 6 years ago
- TextTron is a simple light-weight image processing based text detector for document images.☆50Updated 3 years ago
- ☆69Updated 6 years ago
- Document Image Binarization☆73Updated last month
- ☆72Updated 6 years ago
- ☆55Updated 3 years ago
- DIAR software for synthetic document image and groundtruth generation, with various degradation models for data augmentation☆116Updated 11 months ago
- Tutorial on how to deskew (straighten) text images☆50Updated 2 years ago
- Using FCN to segment the book's content and background, then dewarping the pages,☆18Updated 3 years ago
- Code for ICPR2022 paper: "Graph Neural Networks and Representation Embedding for table extraction in PDF Documents"☆35Updated last year