A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-Domain Evaluation Framework for Academic Documents
☆30Dec 8, 2022Updated 3 years ago
Alternatives and similar repositories for pdf-benchmark
Users that are interested in pdf-benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆20May 1, 2025Updated 10 months ago
- The Science knowledge graph ontologies, a.k.a. SKGO, is a suite of OWL ontology models to capture the knowledge of scientific research da…☆16Jul 3, 2025Updated 8 months ago
- A script to generate tagged XML Citationstrings for citation parsing☆20Apr 17, 2020Updated 5 years ago
- QuoteSum is a textual QA dataset containing Semi-Extractive Multi-source Question Answering (SEMQA) examples written by humans, based on …☆13Mar 25, 2024Updated last year
- Scholarly Big Data Subject Category Classifier☆10Jul 15, 2019Updated 6 years ago
- Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings (EMNLP 2022 paper)☆76Dec 29, 2025Updated 2 months ago
- ☆123Feb 24, 2026Updated 3 weeks ago
- ☆34Jan 2, 2024Updated 2 years ago
- ☆18Aug 21, 2025Updated 7 months ago
- Generating graph structures from OWL ontologies☆12Nov 21, 2017Updated 8 years ago
- ☆26Mar 4, 2026Updated 2 weeks ago
- ACL 2023 paper "A Critical Evaluation of Evaluations for Long-form Question Answering"☆21Mar 22, 2024Updated 2 years ago
- Aligned Neural Topic Model (ANTM) for Exploring Evolving Topics: a dynamic neural topic model that uses document embeddings (data2vec) to…☆37Nov 6, 2023Updated 2 years ago
- A python module and REST API for automatic extraction of metadata from PDF files☆18Nov 11, 2024Updated last year
- ☆11Apr 15, 2022Updated 3 years ago
- CascadER: Cross-Modal Cascading for Knowledge Graph Link Prediction (arXiv 22)☆13Jun 17, 2022Updated 3 years ago
- Targeted Data Generation with Large Language Models☆19Jun 25, 2024Updated last year
- Collection of LaTeX utility packages for scientific documents☆17Sep 13, 2023Updated 2 years ago
- ☆96Mar 11, 2026Updated last week
- A recurrent neural network model to analyze how travelers expressed their feelings on Twitter☆12Jun 30, 2019Updated 6 years ago
- multimodal document analysis☆165Feb 28, 2026Updated 3 weeks ago
- Accelerating GOT-OCRv2 with VLLM☆11Nov 15, 2024Updated last year
- The Python Digital Toolbox contains examples of how to solve various data analysis problems using Python libraries.☆15Oct 21, 2025Updated 5 months ago
- The Python crash course of the Summer Institute in Computational Social Science 2022!☆10Nov 19, 2022Updated 3 years ago
- This is the code for reproducing the TABBIE baseline in our paper: "Retrieval-Based Transformer for Table Augmentation"☆12Sep 17, 2025Updated 6 months ago
- Datably.ai☆17Jun 17, 2025Updated 9 months ago
- SciCap Dataset☆57Nov 5, 2021Updated 4 years ago
- ☆11Aug 8, 2025Updated 7 months ago
- MPB (Miner-PDF-Benchmark) is an end-to-end PDF document comprehension evaluation suite designed for large-scale model data scenarios.☆24Dec 11, 2024Updated last year
- Self-Service Semantic Suite (S4)☆18Sep 29, 2016Updated 9 years ago
- Auxiliary tasks for task-oriented dialogue systems. Published in ICNLSP'22 and indexed in the ACL Anthology.☆17Feb 27, 2023Updated 3 years ago
- ☆21Jul 18, 2024Updated last year
- A small python library to parse and write TSV files generated by the WebAnno software.☆11Apr 14, 2025Updated 11 months ago
- An ontology containing biotic and abiotic plant stresses. Part of the Planteome suite of reference ontologies. Formerly called the Onto…☆18Mar 3, 2026Updated 2 weeks ago
- A Python package for interacting with the MinerU Vision-Language Model.☆109Feb 5, 2026Updated last month
- Repository for paper CELLS: A Parallel Corpus for Biomedical Lay Language Generation☆19Apr 2, 2024Updated last year
- Scanning Single Shot Detector for Math in Document Images☆133Apr 18, 2023Updated 2 years ago
- An idea that take advantages of features of deep learning to use unannotated samples for NER and identify sequences with error labels.☆16Feb 4, 2024Updated 2 years ago
- The official repo for DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph☆18Oct 13, 2024Updated last year