huridocs / pdf-reading-orderLinks
☆14Updated last year
Alternatives and similar repositories for pdf-reading-order
Users that are interested in pdf-reading-order are comparing it to the libraries listed below
Sorting:
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆111Updated last year
- Small python package to measure OCR quality and other related metrics.☆25Updated last year
- DocAI helps developers quickly build document, image and text processing pipelines using open source and cloud-based machine learning mod…☆20Updated 2 years ago
- No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval☆29Updated 3 years ago
- ☆14Updated last year
- This repository is part of an NLP course for humanities and cultural studies. This course uses historical newspapers as a source and appl…☆17Updated 4 months ago
- Evaluation framework for document processing models and services.☆43Updated this week
- Seed Machine Translation Data☆33Updated 10 months ago
- An open-source NLP library: fast text cleaning and preprocessing☆23Updated 3 years ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆49Updated last year
- High-Performance Transformers for Table Structure Recognition Need Early Convolutions☆44Updated last year
- ☆40Updated 4 years ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆72Updated this week
- PyTorch-IE: State-of-the-art Information Extraction in PyTorch☆77Updated 2 weeks ago
- FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction☆24Updated 3 years ago
- Source code for the paper "Post-OCR Document Correction with Large Ensembles of Character Sequence-to-Sequence Models"☆37Updated last year
- ☆25Updated 7 years ago
- An unofficial Implementation of DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents☆37Updated 2 years ago
- A simple semantic search engine for scientific papers.☆28Updated 2 years ago
- Logical structure analysis for visually structured documents☆92Updated 3 years ago
- ☆17Updated 4 years ago
- ☆22Updated 4 years ago
- ☆10Updated last year
- Efficient few-shot learning with cross-encoders.☆59Updated last year
- 中文原生等级化代码能力测试基准☆15Updated last year
- Deploy DL/ ML inference pipelines with minimal extra code.☆99Updated 10 months ago
- My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"☆72Updated this week
- ☆58Updated 4 years ago
- We identify the desiderata for a comprehensive benchmark and propose Visually Rich Document Understanding (VRDU). VRDU contains two datas…☆80Updated 2 years ago
- Extracts plain text, language identification and more metadata from WARC records☆23Updated last week