huridocs / pdf-reading-orderLinks
☆15Updated last year
Alternatives and similar repositories for pdf-reading-order
Users that are interested in pdf-reading-order are comparing it to the libraries listed below
Sorting:
- Small python package to measure OCR quality and other related metrics.☆25Updated last year
- Seed Machine Translation Data☆33Updated last year
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆115Updated last year
- This repository is part of an NLP course for humanities and cultural studies. This course uses historical newspapers as a source and appl…☆18Updated 6 months ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆77Updated this week
- FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction☆24Updated 3 years ago
- Logical structure analysis for visually structured documents☆94Updated 3 years ago
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆23Updated 5 years ago
- Post-processing OCR errors with seq2seq models☆28Updated 5 years ago
- No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval☆29Updated 3 years ago
- multimodal document analysis☆166Updated last month
- Segmenting a given document using recursive xy-cut algorithm.☆12Updated 7 years ago
- DocAI helps developers quickly build document, image and text processing pipelines using open source and cloud-based machine learning mod…☆20Updated 3 years ago
- SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchi…☆35Updated last year
- ☆14Updated last year
- Source code for the paper "Post-OCR Document Correction with Large Ensembles of Character Sequence-to-Sequence Models"☆38Updated 2 years ago
- PyTorch-IE: State-of-the-art Information Extraction in PyTorch☆77Updated 2 months ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆49Updated 2 years ago
- ☆40Updated 4 years ago
- Layout Analysis Dataset with Segmonto (LADaS)☆23Updated 5 months ago
- Using short models to classify long texts☆21Updated 2 years ago
- Implementation of Z-BERT-A: a zero-shot pipeline for unknown intent detection.☆44Updated 2 years ago
- Evaluation framework for document processing models and services.☆58Updated this week
- ☆67Updated last year
- Multi-class text categorization using state-of-the-art pre-trained contextualized language models, e.g. BERT☆23Updated 2 years ago
- Deploy DL/ ML inference pipelines with minimal extra code.☆102Updated last year
- ☆20Updated 4 years ago
- An implementation of Tiling and Corruption (TACo) Augmentations for OCR/HTR☆15Updated 4 years ago
- Correction of spaces with character-based neural language models.☆13Updated 3 years ago
- In the wild extraction of entities that are found using Flair and displayed using a very elegant front-end.☆71Updated 2 years ago