jbarrow / distillateLinks
PDF Extraction Toolkit (wraps and trains LayoutLM)
☆10Updated 4 years ago
Alternatives and similar repositories for distillate
Users that are interested in distillate are comparing it to the libraries listed below
Sorting:
- XFUND: A Multilingual Form Understanding Benchmark☆215Updated 3 years ago
- ☆82Updated 3 years ago
- Publicly released code for the LAMBERT model☆103Updated 4 years ago
- ICDAR 2019: MaskRCNN on PubLayNet datasets. Paragraph detection, table detection, figure detection,...☆182Updated 4 years ago
- Example codebase for fine-tuning layoutLMv3 on DocVQA☆52Updated 3 years ago
- ☆34Updated 3 years ago
- an unofficial code for augment-XY-CUT in XYLayoutLM☆30Updated 3 years ago
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆115Updated last year
- ☆95Updated 5 years ago
- ☆92Updated 3 years ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆179Updated 2 years ago
- Evaluation of the Layoutlm model on the CORD dataset☆32Updated 3 years ago
- table understanding dataset for comparative evaluation of different table understanding algorithms☆14Updated 7 years ago
- ICDAR 2021 Competition on Scientific Literature Parsing☆35Updated 5 years ago
- This is the official repository of the EMNLP 2023 paper Reading Order Matters: Information Extraction from Visually-rich Documents by Tok…☆18Updated last year
- ☆58Updated 4 years ago
- chinese document classification of layoutlmv3 and layoutxlm☆46Updated 3 years ago
- ☆40Updated 5 years ago
- This is the official repository of the revised datasets FUNSD-r and CORD-r, introduced in EMNLP 2023 paper Reading Order Matters: Informa…☆17Updated last year
- ☆132Updated 2 years ago
- Code for ICPR2022 paper: "Graph Neural Networks and Representation Embedding for table extraction in PDF Documents"☆37Updated 2 years ago
- Dataset and scripts for HRDoc☆40Updated 2 years ago
- 🌳CED: Catalog Extraction from Documents☆16Updated 2 years ago
- ☆40Updated 4 years ago
- ☆87Updated 5 years ago
- A step-by-step C# implementation of the Docstrum algorithm☆23Updated 5 years ago
- An unofficial Pytorch implementation of ERNIE-Layout which is originally released through PaddleNLP.☆107Updated 2 years ago
- Dataset of PNG images from synthetically generated table layouts with annotations in JSONL files☆152Updated 3 months ago
- Document Visual Question Answering☆128Updated 5 years ago
- CTE: Contextualized Table Extraction Dataset☆17Updated 2 years ago