jbarrow / distillate
PDF Extraction Toolkit (wraps and trains LayoutLM)
☆10Updated 3 years ago
Alternatives and similar repositories for distillate:
Users that are interested in distillate are comparing it to the libraries listed below
- ☆76Updated 2 years ago
- ICDAR 2021 Competition on Scientific Literature Parsing☆34Updated 4 years ago
- Publicly released code for the LAMBERT model☆101Updated 3 years ago
- XFUND: A Multilingual Form Understanding Benchmark☆193Updated 2 years ago
- Example codebase for fine-tuning layoutLMv3 on DocVQA☆49Updated 2 years ago
- ☆55Updated 3 years ago
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆96Updated 4 months ago
- ☆37Updated 3 years ago
- an unofficial code for augment-XY-CUT in XYLayoutLM☆28Updated 2 years ago
- 🌳CED: Catalog Extraction from Documents☆15Updated last year
- Evaluation of the Layoutlm model on the CORD dataset☆32Updated 2 years ago
- CTE: Contextualized Table Extraction Dataset☆17Updated last year
- DocBankLoader is a dataset loader for DocBank, and can convert DocBank to the Object Detection models' format.☆23Updated 3 years ago
- Key Information Extraction From Documents: Evaluation And Generator☆20Updated 3 years ago
- chinese document classification of layoutlmv3 and layoutxlm☆42Updated 2 years ago
- Implementation of research paper "Deep Splitting and Merging for Table Structure Decomposition"☆61Updated 2 years ago
- Dataset of PNG images from synthetically generated table layouts with annotations in JSONL files☆132Updated last year
- ☆12Updated 4 months ago
- ☆87Updated 4 years ago
- ☆79Updated 2 years ago
- Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset☆25Updated last year
- Repository to use/train segmentation models for document layout analysis☆19Updated 3 years ago
- ☆36Updated 4 years ago
- Dense Article Dataset (DAD): A Benchmark Dataset for Document Layout Analysis☆15Updated 3 years ago
- Language-agnostic BERT Sentence Embedding (LaBSE)☆143Updated 4 years ago
- Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset☆47Updated last year
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆174Updated last year
- An unofficial Pytorch implementation of ERNIE-Layout which is originally released through PaddleNLP.☆101Updated last year
- This repository contains a 403 images dataset for table detection in documents.☆83Updated 6 years ago