PedroBarcha / old-books-dataset
Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.
☆12Updated 7 years ago
Alternatives and similar repositories for old-books-dataset:
Users that are interested in old-books-dataset are comparing it to the libraries listed below
- An implementation of Tiling and Corruption (TACo) Augmentations for OCR/HTR☆15Updated 3 years ago
- Code for the ICDAR2021 paper "Visual FUDGE: Form Understanding via Dynamic Graph Editing"☆33Updated 2 years ago
- ☆21Updated 2 years ago
- [ICDAR 2023] SelfDocSeg: A self-supervised vision-based approach towards Document Segmentation (Oral)☆38Updated last year
- Text and Layout Document Image Understanding. LayoutLM☆21Updated 3 years ago
- Easter2.0: IMPROVING CONVOLUTIONAL MODELS FOR HANDWRITTEN TEXT RECOGNITION☆78Updated last year
- DFKI Layout Detection for OCR-D☆47Updated 2 months ago
- ☆15Updated 6 months ago
- ☆15Updated 4 years ago
- ☆73Updated 2 years ago
- Document Image Binarization☆75Updated 3 months ago
- DL models that take a document image file as input, locate the position of paragraphs, lines, images, etc. with their labels and confiden…☆26Updated 4 years ago
- OCR & Ground Truth Resources☆74Updated 2 years ago
- Detect handwritten words (neural network based).☆67Updated 2 years ago
- Key Information Extraction From Documents: Evaluation And Generator☆20Updated 3 years ago
- ☆136Updated 10 months ago
- Detect textlines in document images☆91Updated 7 months ago
- Repository of the back end implementation of DivaServices☆14Updated 5 years ago
- [ICDAR 2023] (Oral) An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation☆70Updated 4 months ago
- ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction☆28Updated 5 years ago
- ☆25Updated 4 years ago
- OCR-D-compliant page segmentation☆67Updated 4 months ago
- Pretrained mixed models to be used with Calamari.☆60Updated 3 months ago
- Extraction of meaningful instances from document images with a Chargrid model☆34Updated 3 years ago
- ☆33Updated 4 years ago
- A Dense Text Detection model using Receptive Field Blocks☆31Updated 2 years ago
- ☆17Updated 2 years ago
- Line Segmentation of Handwritten Documents using the A* Path Planning Algorithm☆27Updated 4 years ago
- ☆23Updated 3 months ago
- Sample implementation of OCR metrics (CER, WER) calculation with TesseractOCR and fastwer☆27Updated 3 years ago