peetio / Diard

From document (PDF) or document images to analysis ready semi-structured data.

☆21

Related projects: ⓘ

AILab-UniFI / cte-dataset
CTE: Contextualized Table Extraction Dataset
☆17Updated last year
peetio / table-transformer-simple-inference
Simple table extraction example.
☆10Updated 2 years ago
phamquiluan / table-transformer
CVPR 2022: Table Structure Recognition
☆39Updated 2 years ago
adlnlp / form_nlu
☆12Updated 3 months ago
MaitySubhajit / SelfDocSeg
[ICDAR 2023] SelfDocSeg: A self-supervised vision-based approach towards Document Segmentation (Oral)
☆36Updated 11 months ago
DS3Lab / DocParser
☆75Updated 2 years ago
biswassanket / DocSegTr
A Bottom-Up Instance Segmentation Strategy for segmenting document instances using Transformers
☆52Updated last week
adlnlp / doc_gcn
☆16Updated last year
allenai / vila
Incorporating VIsual LAyout Structures for Scientific Text Classification
☆166Updated last year
applicaai / kleister-charity
☆37Updated 3 years ago
jfma-USTC / HRDoc
Dataset and scripts for HRDoc
☆30Updated last year
doc-analysis / ReadingBank
ReadingBank: A Benchmark Dataset for Reading Order Detection
☆90Updated 3 weeks ago
herobd / FUDGE
Code for the ICDAR2021 paper "Visual FUDGE: Form Understanding via Dynamic Graph Editing"
☆33Updated 2 years ago
abdoelsayed2016 / TNCR_Dataset
Deep learning, Convolutional neural networks, Image processing, Document processing, Table detection, Page object detection, Table classi…
☆63Updated 6 months ago
ZZR8066 / GraphDoc
☆40Updated 2 years ago
applicaai / lambert
Publicly released code for the LAMBERT model
☆103Updated 3 years ago
allanj / LayoutLMv3-DocVQA
Example codebase for fine-tuning layoutLMv3 on DocVQA
☆48Updated 2 years ago
IBM / SynthTabNet
Dataset of PNG images from synthetically generated table layouts with annotations in JSONL files
☆121Updated 10 months ago
rubenpt91 / MP-DocVQA-Framework
☆52Updated 8 months ago
herobd / dessurt
Official implementation for Dessurt
☆56Updated last year
j-rausch / DSG
☆29Updated 5 months ago
furkanbiten / idl_data
OCR Annotations from Amazon Textract for Industry Documents Library
☆99Updated 2 years ago
applicaai / kleister-nda
☆55Updated 3 years ago
sarkhelritesh / vrd_resource
☆23Updated 3 years ago
rossumai / docile
DocILE: Document Information Localization and Extraction Benchmark
☆116Updated 4 months ago
wanghaisheng / ocr-arxiv-daily
☆17Updated last year
andreagemelli / doc2graph
Doc2Graph transforms documents into graphs and exploit a GNN to solve several tasks.
☆113Updated last year
allenai / mmda
multimodal document analysis
☆159Updated 3 months ago
mavillot / FUNSD-Entity-Linking
☆9Updated 2 years ago
EriCongMa / awesome-transformer-ocr
This repository is created to share current progress of transformer based optical character recognition(OCR). Welcome to share~
☆44Updated 11 months ago