DS3Lab/WordScape

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/DS3Lab/WordScape)

DS3Lab / WordScape

The WordScape repository contains code for the WordScape pipeline to create datasets to train document understanding models.

☆42

Alternatives and similar repositories for WordScape

Users that are interested in WordScape are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Line-Kite / GraphLayoutLM
View on GitHub
☆14Sep 6, 2024Updated last year
rubenpt91 / MP-DocVQA-Framework
View on GitHub
☆72Jan 9, 2024Updated 2 years ago
applicaai / CCpdf
View on GitHub
Index of URLs to pdf files all over the internet and scripts
☆25May 2, 2023Updated 3 years ago
aimagelab / FourBi
View on GitHub
Binarizing Documents by Leveraging both Space and Frequency. (ICDAR 2024)
☆18May 15, 2025Updated last year
nttmdlab-nlp / InstructDoc
View on GitHub
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)
☆162May 31, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
jfma-USTC / HRDoc
View on GitHub
Dataset and scripts for HRDoc
☆42Jun 21, 2023Updated 3 years ago
NNDam / vietocr-tensorrt
View on GitHub
Create TensorRT-runtime for vietocr
☆12Jun 8, 2021Updated 5 years ago
harrytea / TGDoc
View on GitHub
"Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023
☆16Nov 28, 2024Updated last year
applicaai / kleister-charity
View on GitHub
☆40Aug 18, 2021Updated 4 years ago
pyxy-org / pyxy
View on GitHub
HTML in Python
☆14Jul 19, 2024Updated 2 years ago
due-benchmark / du-schema
View on GitHub
JSON Schema format for storing datasets details, documents processed contents, and documents annotations in the document understanding do…
☆14Nov 5, 2024Updated last year
HCIILAB / LAST
View on GitHub
Read Ten Lines at One Glance: Line-Aware Semi-Autoregressive Transformer for Multi-Line Handwritten Mathematical Expression Recognition
☆28Aug 29, 2023Updated 2 years ago
clovaai / synthtiger
View on GitHub
Official Implementation of SynthTIGER (Synthetic Text Image Generator), ICDAR 2021
☆578Jun 14, 2024Updated 2 years ago
HAMNET-AI / PDFTriage
View on GitHub
Reproduction paper --- PDFTriage : Question Answering over Long, Structured Documents
☆42Jan 16, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
nttmdlab-nlp / VisualMRC
View on GitHub
VisualMRC: Machine Reading Comprehension on Document Images (AAAI2021)
☆57Mar 31, 2025Updated last year
applicaai / kleister-nda
View on GitHub
☆61Aug 18, 2021Updated 4 years ago
MelosY / CAM
View on GitHub
☆27Feb 20, 2024Updated 2 years ago
Percent-BFD / neurips_submission
View on GitHub
☆17Nov 23, 2023Updated 2 years ago
IBM / KVP10k
View on GitHub
Repository for the KVP10k dataset
☆23Sep 18, 2025Updated 10 months ago
onealwj / MVLT
View on GitHub
PyTorch implementation of BMVC2022 paper Masked Vision-Language Transformers for Scene Text Recognition
☆29Nov 11, 2022Updated 3 years ago
Caiyuan-Zheng / Consistency_Regularization_STR
View on GitHub
It's the code for the paper Pushing the Performance Limit of Scene Text Recognizer without Human Annotation, CVPR 2022.
☆28Jul 6, 2022Updated 4 years ago
chongzhangFDU / Token-Path-Prediction-Datasets
View on GitHub
This is the official repository of the revised datasets FUNSD-r and CORD-r, introduced in EMNLP 2023 paper Reading Order Matters: Informa…
☆17Mar 20, 2024Updated 2 years ago
zzyhlyoko / DCTC
View on GitHub
☆42Sep 2, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
IlyasMoutawwakil / Faster-TrOCR
View on GitHub
TrOCR but 2 to 3 times faster
☆11Oct 22, 2022Updated 3 years ago
NUSTM / ECPE-MLL
View on GitHub
[EMNLP2020] End-to-End Emotion-Cause Pair Extraction based on SlidingWindow Multi-Label Learning
☆20Oct 13, 2020Updated 5 years ago
cmmp / pyproclus
View on GitHub
A python implementation of PROCLUS: PROjected CLUStering algorithm.
☆10Jan 12, 2015Updated 11 years ago
sudoAimer / TRT-Segformer
View on GitHub
Using TensorRT accelerate Segformer.
☆11Oct 6, 2023Updated 2 years ago
zlwang-cs / LASER-release
View on GitHub
Repo for the paper: Towards Few-shot Entity Recognition in Document Images:A Label-aware Sequence-to-Sequence Framework
☆14May 31, 2023Updated 3 years ago
allenai / data-efficient-finetuning
View on GitHub
Code for paper 'Data-Efficient FineTuning'
☆28May 24, 2023Updated 3 years ago
syedsaqibbukhari / docanalysis
View on GitHub
☆10Aug 5, 2019Updated 6 years ago
Royalvice / DocDiff
View on GitHub
ACM Multimedia 2023: DocDiff: Document Enhancement via Residual Diffusion Models. Also contains 1597 red seals in Chinese scenes, along w…
☆350Aug 22, 2024Updated last year
mit1208 / Document-AI
View on GitHub
☆19Feb 5, 2026Updated 5 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
microsoft / MoPQ
View on GitHub
☆13Nov 26, 2021Updated 4 years ago
fh2019ustc / DeepEraser
View on GitHub
The official code for “DeepEraser: Deep Iterative Context Mining for Generic Text Eraser”, TMM, 2024.
☆53Aug 26, 2024Updated last year
chenxn2020 / GOSE
View on GitHub
[Paper] Code for the EMNLP2023 (Findings) paper "Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document"
☆17Dec 1, 2023Updated 2 years ago
nayohan / SentiCSE
View on GitHub
[COLING 2024] SentiCSE: A Sentiment-aware Contrastive Sentence Embedding Framework with Sentiment-guided Textual Similarity
☆13May 8, 2024Updated 2 years ago
deepopinion / anls_star_metric
View on GitHub
Official implementation of the ANLS* metric
☆25Jul 13, 2026Updated last week
NormXU / ERNIE-Layout-Pytorch
View on GitHub
An unofficial Pytorch implementation of ERNIE-Layout which is originally released through PaddleNLP.
☆107Nov 15, 2023Updated 2 years ago
verarong / invoice_text_renderer
View on GitHub
票据识别合成样本
☆12Apr 23, 2021Updated 5 years ago