huridocs / pdf-reading-orderLinks
☆13Updated last year
Alternatives and similar repositories for pdf-reading-order
Users that are interested in pdf-reading-order are comparing it to the libraries listed below
Sorting:
- Small python package to measure OCR quality and other related metrics.☆25Updated last year
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆109Updated last year
- No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval☆29Updated 2 years ago
- This repository is part of an NLP course for humanities and cultural studies. This course uses historical newspapers as a source and appl…☆17Updated 2 months ago
- Seed Machine Translation Data☆33Updated 9 months ago
- DocAI helps developers quickly build document, image and text processing pipelines using open source and cloud-based machine learning mod…☆20Updated 2 years ago
- ☆14Updated 10 months ago
- Source code for the paper "Post-OCR Document Correction with Large Ensembles of Character Sequence-to-Sequence Models"☆37Updated last year
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆49Updated last year
- ☆40Updated 4 years ago
- Efficient few-shot learning with cross-encoders.☆57Updated last year
- ☆25Updated 7 years ago
- FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction☆24Updated 3 years ago
- ☆10Updated last year
- PyTorch-IE: State-of-the-art Information Extraction in PyTorch☆78Updated 3 weeks ago
- Index of URLs to pdf files all over the internet and scripts☆24Updated 2 years ago
- Using short models to classify long texts☆21Updated 2 years ago
- SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchi…☆33Updated last year
- DFKI Layout Detection for OCR-D☆47Updated 3 months ago
- Post-processing OCR errors with seq2seq models☆28Updated 5 years ago
- ☆58Updated 4 years ago
- Logical structure analysis for visually structured documents☆91Updated 3 years ago
- We identify the desiderata for a comprehensive benchmark and propose Visually Rich Document Understanding (VRDU). VRDU contains two datas…☆80Updated 2 years ago
- Universal text classifier for generative models☆24Updated last year
- Implementation of the DocLLM paper for Llama models.☆13Updated 4 months ago
- A TextTiling-based algorithm for text segmentation (aka topic segmentation) that uses neural sentence encoders, as well as extractive sum…☆48Updated 2 years ago
- multimodal document analysis☆165Updated last year
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆75Updated 10 months ago
- CTE: Contextualized Table Extraction Dataset☆17Updated 2 years ago
- ☆20Updated 4 years ago