dell-research-harvard / effocr
A model(ing framework) for sample efficient OCR
☆56Updated last year
Alternatives and similar repositories for effocr:
Users that are interested in effocr are comparing it to the libraries listed below
- A Large Dataset of Historical Japanese Documents with Complex Layouts☆32Updated 2 years ago
- Source code for the paper "Post-OCR Document Correction with Large Ensembles of Character Sequence-to-Sequence Models"☆36Updated last year
- Table detection (TD) and table structure recognition (TSR) using Yolov5/Yolov8, and you can get the same (even better) result compared wi…☆44Updated 6 months ago
- [ICDAR 2023] (Oral) An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation☆70Updated 4 months ago
- High-Performance Transformers for Table Structure Recognition Need Early Convolutions☆42Updated 9 months ago
- ☆110Updated 11 months ago
- Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.☆174Updated last month
- Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset☆25Updated last year
- Noise-robust de-duplication at scale☆15Updated last year
- [ICDAR 2023] SelfDocSeg: A self-supervised vision-based approach towards Document Segmentation (Oral)☆38Updated last year
- The official Github for the American Stories dataset as in {link}☆112Updated 10 months ago
- [MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.☆25Updated last month
- Datasets and Evaluation Scripts for CompHRDoc☆31Updated 9 months ago
- An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Informat…☆53Updated last year
- Code for ICPR2022 paper: "Graph Neural Networks and Representation Embedding for table extraction in PDF Documents"☆35Updated last year
- OCR & Ground Truth Resources☆74Updated 2 years ago
- CTE: Contextualized Table Extraction Dataset☆17Updated last year
- OCR Annotations from Amazon Textract for Industry Documents Library☆101Updated 2 years ago
- A Bottom-Up Instance Segmentation Strategy for segmenting document instances using Transformers☆55Updated 4 months ago
- Dense Article Dataset (DAD): A Benchmark Dataset for Document Layout Analysis☆15Updated 3 years ago
- The scripts for training Detectron2-based Layout Models on popular layout analysis datasets☆204Updated last year
- Official implementation for Dessurt☆57Updated 2 years ago
- TeX compilation service that makes use of arXiv.org's AutoTeX library.☆27Updated 7 months ago
- Doc2Graph transforms documents into graphs and exploit a GNN to solve several tasks.☆117Updated last year
- CVPR 2022: Table Structure Recognition☆39Updated 2 years ago
- Document Image Binarization☆75Updated 3 months ago
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆96Updated 4 months ago
- Object Detection Model for Scanned Documents☆86Updated last year
- A PyTorch implementation of DTrOCR: Decoder-only Transformer for Optical Character Recognition☆116Updated 5 months ago
- ☆157Updated 2 years ago