PedroBarcha / old-books-datasetLinks
Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.
☆14Updated 8 years ago
Alternatives and similar repositories for old-books-dataset
Users that are interested in old-books-dataset are comparing it to the libraries listed below
Sorting:
- Code for the ICDAR2021 paper "Visual FUDGE: Form Understanding via Dynamic Graph Editing"☆33Updated 3 years ago
- CVPR 2022: Table Structure Recognition☆40Updated 3 years ago
- ☆18Updated last year
- ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction☆27Updated 6 years ago
- Handwritten text recognition using transformers.☆158Updated last year
- Extraction of meaningful instances from document images with a Chargrid model☆34Updated 4 years ago
- Easter2.0: IMPROVING CONVOLUTIONAL MODELS FOR HANDWRITTEN TEXT RECOGNITION☆79Updated 2 years ago
- TableNet Implementation on Pytorch☆149Updated 2 years ago
- ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction☆32Updated 3 years ago
- ☆15Updated 5 years ago
- Pytorch Implementation of TableNet☆67Updated 4 years ago
- Detect textlines in document images☆92Updated last year
- Form images from U.S. National Archives annotated with text bounding boxes, classes, relationships, and transcription.☆38Updated 3 years ago
- Attention-based sequence-to-sequence model for handwritten word recognition☆62Updated last year
- CDeC-Net: Composite Deformable Cascade Network for Table Detection in Document Images☆133Updated 2 months ago
- Key Information Extraction From Documents: Evaluation And Generator☆20Updated 4 years ago
- Deep learning, Convolutional neural networks, Image processing, Document processing, Table detection, Page object detection, Table classi…☆70Updated last year
- Research papers and code on information extraction from image/pdf☆97Updated 2 years ago
- DIAR software for synthetic document image and groundtruth generation, with various degradation models for data augmentation☆128Updated last year
- Pytorch implementation of our paper: Adapting OCR with Limited Labels☆62Updated last year
- An application of high resolution GANs to dewarp images of perturbed documents☆148Updated 4 years ago
- Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition☆282Updated 3 years ago
- TableNet: Deep Learning model for end-to-end Table Detection and Tabular data extraction from Scanned Data Images In modern times, more a…☆62Updated 3 years ago
- ☆16Updated 4 years ago
- Code for BMVC2020 paper "Text and Style Conditioned GAN for Generation of Offline Handwriting Lines"☆73Updated 2 years ago
- PyTorch Re-Implementation of CRAFT: Character Region Awareness for Text Detection☆26Updated 4 years ago
- DocILE: Document Information Localization and Extraction Benchmark☆138Updated last year
- Sample implementation of OCR metrics (CER, WER) calculation with TesseractOCR and fastwer☆29Updated 4 years ago
- This repository contains a 403 images dataset for table detection in documents.☆83Updated 7 years ago
- Detectron2 for Document Layout Analysis☆187Updated last year