huridocs / pdf-reading-orderLinks
☆13Updated last year
Alternatives and similar repositories for pdf-reading-order
Users that are interested in pdf-reading-order are comparing it to the libraries listed below
Sorting:
- Large-scale query-focused multi-document Summarization dataset☆10Updated 3 years ago
- This repository is part of an NLP course for humanities and cultural studies. This course uses historical newspapers as a source and appl…☆17Updated 2 weeks ago
- ☆14Updated 8 months ago
- ☆25Updated 7 years ago
- No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval☆29Updated 2 years ago
- Seed Machine Translation Data☆32Updated 7 months ago
- FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction☆24Updated 3 years ago
- CTE: Contextualized Table Extraction Dataset☆17Updated 2 years ago
- The collection of bulding blocks building fine-tunable metric learning models☆32Updated 2 months ago
- Multilingual Entity Linking model by BELA model☆12Updated last year
- Small python package to measure OCR quality and other related metrics.☆23Updated last year
- Segmenting a given document using recursive xy-cut algorithm.☆12Updated 6 years ago
- An open-source NLP library: fast text cleaning and preprocessing☆23Updated 3 years ago
- code and data used to build a training dataset for dragnet models☆10Updated 4 years ago
- Repository for Findings of EMNLP 2020 "Context-aware Stand-alone Neural Spelling Correction"☆18Updated 4 years ago
- KuaiSearch PERKS☆11Updated 3 years ago
- Streamlit Named Entity Recognition (NER) annotation custom component☆38Updated 2 years ago
- mSimCSE: Multilingual SimCSE☆34Updated 2 years ago
- A simple semantic search engine for scientific papers.☆28Updated last year
- 阅读顺序、Layoutreader☆16Updated last month
- DFKI Layout Detection for OCR-D☆47Updated last month
- Given a text, wrap it into phrases and send them to Yandex's search engine. If it yields a "did you mean:", substitute the original phras…☆11Updated 6 years ago
- Code for EMNLP 2023 paper: DALE: Generative Data Augmentation for Low-Resource Legal NLP☆10Updated last year
- NLG Best Practices for Data-Efficient Modeling How to Train Production-Ready Models with Little Data☆10Updated 3 years ago
- A text augmentation tool for named entity recognition.☆53Updated 3 years ago
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆75Updated 8 months ago
- Resources accompanying the "Zero-Shot Recommendation as Language Modeling" paper (ECIR2022)☆14Updated 2 years ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆48Updated last year
- TeX compilation service that makes use of arXiv.org's AutoTeX library.☆33Updated last week
- WebRED is a large and diverse manually annotated dataset for extracting relationships from a variety of text found on the World Wide Web.☆22Updated 4 years ago