The WordScape repository contains code for the WordScape pipeline to create datasets to train document understanding models.
☆41Dec 7, 2023Updated 2 years ago
Alternatives and similar repositories for WordScape
Users that are interested in WordScape are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆14Sep 6, 2024Updated last year
- ☆70Jan 9, 2024Updated 2 years ago
- Index of URLs to pdf files all over the internet and scripts☆25May 2, 2023Updated 2 years ago
- Binarizing Documents by Leveraging both Space and Frequency. (ICDAR 2024)☆15May 15, 2025Updated 10 months ago
- InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)☆163May 31, 2024Updated last year
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Dataset and scripts for HRDoc☆41Jun 21, 2023Updated 2 years ago
- Create TensorRT-runtime for vietocr☆12Jun 8, 2021Updated 4 years ago
- ☆37Jan 26, 2026Updated 2 months ago
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆16Nov 28, 2024Updated last year
- ☆40Aug 18, 2021Updated 4 years ago
- HTML in Python☆12Jul 19, 2024Updated last year
- JSON Schema format for storing datasets details, documents processed contents, and documents annotations in the document understanding do…☆14Nov 5, 2024Updated last year
- Ongoing research project for code&math LLMs☆29Jul 4, 2025Updated 9 months ago
- ☆15Apr 12, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Read Ten Lines at One Glance: Line-Aware Semi-Autoregressive Transformer for Multi-Line Handwritten Mathematical Expression Recognition☆28Aug 29, 2023Updated 2 years ago
- Official Implementation of SynthTIGER (Synthetic Text Image Generator), ICDAR 2021☆574Jun 14, 2024Updated last year
- Official implementation of Generative Colorization of Structured Mobile Web Pages, WACV 2023.☆22Dec 7, 2023Updated 2 years ago
- VisualMRC: Machine Reading Comprehension on Document Images (AAAI2021)☆57Mar 31, 2025Updated last year
- ☆59Aug 18, 2021Updated 4 years ago
- ☆18Jul 7, 2025Updated 9 months ago
- ☆27Feb 20, 2024Updated 2 years ago
- ☆82Apr 12, 2022Updated 3 years ago
- The most comprehensive Chinese Telegraph Code table☆12Jul 5, 2015Updated 10 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- PyTorch implementation of BMVC2022 paper Masked Vision-Language Transformers for Scene Text Recognition☆29Nov 11, 2022Updated 3 years ago
- It's the code for the paper Pushing the Performance Limit of Scene Text Recognizer without Human Annotation, CVPR 2022.☆28Jul 6, 2022Updated 3 years ago
- ☆42Sep 2, 2023Updated 2 years ago
- [EMNLP2020] End-to-End Emotion-Cause Pair Extraction based on SlidingWindow Multi-Label Learning☆20Oct 13, 2020Updated 5 years ago
- weixin125个人健康数据管理系统的设计与实现微信小程序+ssm后端毕业源码案例设计☆11Feb 28, 2024Updated 2 years ago
- A python implementation of PROCLUS: PROjected CLUStering algorithm.☆10Jan 12, 2015Updated 11 years ago
- HOCR Specification Python Parser☆12Sep 23, 2015Updated 10 years ago
- ☆10Aug 5, 2019Updated 6 years ago
- ☆19Feb 5, 2026Updated 2 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Code for paper 'Data-Efficient FineTuning'☆28May 24, 2023Updated 2 years ago
- ACM Multimedia 2023: DocDiff: Document Enhancement via Residual Diffusion Models. Also contains 1597 red seals in Chinese scenes, along w…☆343Aug 22, 2024Updated last year
- An unofficial Pytorch implementation of ERNIE-Layout which is originally released through PaddleNLP.☆107Nov 15, 2023Updated 2 years ago
- ☆11Jul 31, 2022Updated 3 years ago
- ☆14Jan 11, 2022Updated 4 years ago
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆83Jan 30, 2023Updated 3 years ago
- The official code for “DeepEraser: Deep Iterative Context Mining for Generic Text Eraser”, TMM, 2024.☆51Aug 26, 2024Updated last year