huridocs / pdf-reading-orderLinks
☆14Updated last year
Alternatives and similar repositories for pdf-reading-order
Users that are interested in pdf-reading-order are comparing it to the libraries listed below
Sorting:
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆109Updated last year
- DocAI helps developers quickly build document, image and text processing pipelines using open source and cloud-based machine learning mod…☆20Updated 2 years ago
- Seed Machine Translation Data☆33Updated 10 months ago
- No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval☆29Updated 2 years ago
- Small python package to measure OCR quality and other related metrics.☆25Updated last year
- ☆14Updated 11 months ago
- This repository is part of an NLP course for humanities and cultural studies. This course uses historical newspapers as a source and appl…☆17Updated 3 months ago
- [COLM 2024] Early Weight Averaging meets High Learning Rates for LLM Pre-training☆17Updated 11 months ago
- Source code for the paper "Post-OCR Document Correction with Large Ensembles of Character Sequence-to-Sequence Models"☆37Updated last year
- FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction☆24Updated 3 years ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆49Updated last year
- Using short models to classify long texts☆21Updated 2 years ago
- High-Performance Transformers for Table Structure Recognition Need Early Convolutions☆44Updated last year
- Implementation of Z-BERT-A: a zero-shot pipeline for unknown intent detection.☆42Updated 2 years ago
- multimodal document analysis☆166Updated last year
- ☆40Updated 4 years ago
- We identify the desiderata for a comprehensive benchmark and propose Visually Rich Document Understanding (VRDU). VRDU contains two datas…☆80Updated 2 years ago
- ☆58Updated 4 years ago
- SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchi…☆33Updated last year
- Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.☆105Updated 3 years ago
- Deploy DL/ ML inference pipelines with minimal extra code.☆99Updated 10 months ago
- The collection of bulding blocks building fine-tunable metric learning models☆32Updated 5 months ago
- ☆81Updated 3 years ago
- Official repository of the paper: "A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition"☆25Updated 2 years ago
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆32Updated 2 years ago
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆75Updated 11 months ago
- An open-source NLP library: fast text cleaning and preprocessing☆23Updated 3 years ago
- A simple semantic search engine for scientific papers.☆28Updated 2 years ago
- PyTorch-IE: State-of-the-art Information Extraction in PyTorch☆78Updated last month
- 🚀🤗 A collection of templates for Hugging Face Spaces☆35Updated last year