huridocs / pdf-reading-orderLinks
☆13Updated last year
Alternatives and similar repositories for pdf-reading-order
Users that are interested in pdf-reading-order are comparing it to the libraries listed below
Sorting:
- Large-scale query-focused multi-document Summarization dataset☆10Updated 3 years ago
- Small python package to measure OCR quality and other related metrics.☆22Updated last year
- Repository for Findings of EMNLP 2020 "Context-aware Stand-alone Neural Spelling Correction"☆18Updated 4 years ago
- GPT-jax based on the official huggingface library☆13Updated 3 years ago
- Segmenting a given document using recursive xy-cut algorithm.☆12Updated 6 years ago
- Source code for the paper "Post-OCR Document Correction with Large Ensembles of Character Sequence-to-Sequence Models"☆36Updated last year
- Streamlit Named Entity Recognition (NER) annotation custom component☆38Updated 2 years ago
- ☆39Updated 3 years ago
- ☆14Updated 8 months ago
- CTE: Contextualized Table Extraction Dataset☆17Updated 2 years ago
- Using short models to classify long texts☆21Updated 2 years ago
- Seed Machine Translation Data☆32Updated 6 months ago
- ☆28Updated 4 months ago
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆105Updated 9 months ago
- ☆25Updated 7 years ago
- Index of URLs to pdf files all over the internet and scripts☆23Updated 2 years ago
- This repository is part of an NLP course for humanities and cultural studies. This course uses historical newspapers as a source and appl…☆17Updated 2 months ago
- An open-source NLP library: fast text cleaning and preprocessing☆23Updated 3 years ago
- TorchServe+Streamlit for easily serving your HuggingFace NER models☆33Updated 2 years ago
- ☆17Updated 4 years ago
- [COLM 2024] Early Weight Averaging meets High Learning Rates for LLM Pre-training☆16Updated 7 months ago
- GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning☆27Updated 4 years ago
- The collection of bulding blocks building fine-tunable metric learning models☆32Updated last month
- No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval☆29Updated 2 years ago
- This is a prototype of a multi-lingual suite for named-entity recognition in Python.☆21Updated last year
- A text augmentation tool for named entity recognition.☆52Updated 3 years ago
- ☆12Updated 5 months ago
- Given a text, wrap it into phrases and send them to Yandex's search engine. If it yields a "did you mean:", substitute the original phras…☆11Updated 6 years ago
- SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchi…☆33Updated last year
- Large Scale BERT Distillation☆32Updated 2 years ago