gipplab/pdf-benchmark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/gipplab/pdf-benchmark)

gipplab / pdf-benchmark

A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-Domain Evaluation Framework for Academic Documents

☆32

Alternatives and similar repositories for pdf-benchmark

Users that are interested in pdf-benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Phyks / libbmc
View on GitHub
A python library to deal with scientific papers.
☆17Apr 2, 2016Updated 10 years ago
allenai / S2APLER
View on GitHub
S2APLER: S2 Agglomeration of Papers with Low Error Rate (it's for academic paper clustering)
☆22Jul 8, 2026Updated 2 weeks ago
wtsnjp / MioGatto
View on GitHub
An annotation tool for grounding of formulae
☆24May 28, 2024Updated 2 years ago
SeerLabs / sbdsubjectclassifier
View on GitHub
Scholarly Big Data Subject Category Classifier
☆10Jul 15, 2019Updated 7 years ago
realityengines / post_hoc_debiasing
View on GitHub
☆17Aug 13, 2020Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
malteos / scincl
View on GitHub
Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings (EMNLP 2022 paper)
☆79Dec 29, 2025Updated 6 months ago
danilo-dessi / SKG-pipeline
View on GitHub
☆21May 1, 2025Updated last year
laurenfklein / QTM340-Fall22
View on GitHub
Notebooks and other course materials for Emory QTM 340 (Fall 2022)
☆12Dec 13, 2022Updated 3 years ago
saidfathalla / Science-knowledge-graph-ontologies
View on GitHub
The Science knowledge graph ontologies, a.k.a. SKGO, is a suite of OWL ontology models to capture the knowledge of scientific research da…
☆17Jul 3, 2025Updated last year
shunzh / mcts-for-llm
View on GitHub
This is a pip package implementing Reinforcement Learning algorithms in non-stationary environments supported by the OpenAI Gym toolkit.
☆16Jun 28, 2024Updated 2 years ago
allenai / s2_fos
View on GitHub
☆34Jan 2, 2024Updated 2 years ago
allenai / vila
View on GitHub
Incorporating VIsual LAyout Structures for Scientific Text Classification
☆180Mar 18, 2023Updated 3 years ago
hamedR96 / ANTM
View on GitHub
Aligned Neural Topic Model (ANTM) for Exploring Evolving Topics: a dynamic neural topic model that uses document embeddings (data2vec) to…
☆37Nov 6, 2023Updated 2 years ago
sahandha / SINDy
View on GitHub
☆11Jan 23, 2017Updated 9 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
nobu-g / cohesion-analysis
View on GitHub
Code for COLING 2020 Paper
☆13Feb 3, 2026Updated 5 months ago
macmillancontentscience / morphemepiece
View on GitHub
☆11Apr 15, 2022Updated 4 years ago
carriex / lfqa_eval
View on GitHub
ACL 2023 paper "A Critical Evaluation of Evaluations for Long-form Question Answering"
☆21Mar 22, 2024Updated 2 years ago
tsafavi / cascader
View on GitHub
CascadER: Cross-Modal Cascading for Knowledge Graph Link Prediction (arXiv 22)
☆13Jun 17, 2022Updated 4 years ago
kliakhnovich / smmr
View on GitHub
☆17Nov 17, 2025Updated 8 months ago
webis-de / ir_axioms
View on GitHub
↕️ Intuitive axiomatic retrieval experimentation.
☆31Updated this week
winter1203 / vllm_GOT2_OCR
View on GitHub
Accelerating GOT-OCRv2 with VLLM
☆10Nov 15, 2024Updated last year
allenai / mmda
View on GitHub
multimodal document analysis
☆166May 14, 2026Updated 2 months ago
maxdotio / neural-solr
View on GitHub
Neural Solr = Solr 9 + Mighty Inference + Node
☆18Jun 9, 2022Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
hitachi-nlp / ensemble-metrics
View on GitHub
☆18Dec 25, 2023Updated 2 years ago
allenai / SPECTER2
View on GitHub
☆137Feb 24, 2026Updated 4 months ago
deadbits / vector-embedding-api
View on GitHub
Flask API for generating text embeddings using OpenAI or sentence_transformers
☆14Sep 1, 2023Updated 2 years ago
informagi / mmead
View on GitHub
MS Marco Entity Annotations Disambiguation
☆14May 19, 2023Updated 3 years ago
guggio / bbc_news
View on GitHub
☆10May 29, 2020Updated 6 years ago
COMBINE-lab / piscem-infer
View on GitHub
☆15May 22, 2026Updated 2 months ago
IBM / retrieval-table-augmentation
View on GitHub
This is the code for reproducing the TABBIE baseline in our paper: "Retrieval-Based Transformer for Table Augmentation"
☆12Sep 17, 2025Updated 10 months ago
ub-unibe-ch / ds-pytools
View on GitHub
The Python Digital Toolbox contains examples of how to solve various data analysis problems using Python libraries.
☆16May 8, 2026Updated 2 months ago
openvisualizationacademy / ova_site
View on GitHub
Website for Open Visualization Academy
☆15Jun 30, 2026Updated 3 weeks ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
benkeser / info550
View on GitHub
Repository for Emory University Rollins SPH INFO550
☆16Aug 18, 2022Updated 3 years ago
tingyaohsu / SciCap
View on GitHub
SciCap Dataset
☆59Nov 5, 2021Updated 4 years ago
radi-cho / RSTOD
View on GitHub
Auxiliary tasks for task-oriented dialogue systems. Published in ICNLSP'22 and indexed in the ACL Anthology.
☆17Feb 27, 2023Updated 3 years ago
Planteome / plant-stress-ontology
View on GitHub
An ontology containing biotic and abiotic plant stresses. Part of the Planteome suite of reference ontologies. Formerly called the Onto…
☆18Apr 14, 2026Updated 3 months ago
neuged / webanno_tsv
View on GitHub
A small python library to parse and write TSV files generated by the WebAnno software.
☆11Apr 14, 2025Updated last year
elsevierlabs / OA-STM-Corpus
View on GitHub
Corpus of Open Access articles from multiple fields in Science, Technology, and Medicine.
☆75Mar 28, 2017Updated 9 years ago
JanaLasser / SICSS-aachen-graz
View on GitHub
Repository for the learning materials of the Aachen-Graz SICSS location.
☆19Oct 19, 2023Updated 2 years ago