A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-Domain Evaluation Framework for Academic Documents
☆31Dec 8, 2022Updated 3 years ago
Alternatives and similar repositories for pdf-benchmark
Users that are interested in pdf-benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- S2APLER: S2 Agglomeration of Papers with Low Error Rate (it's for academic paper clustering)☆21Nov 4, 2025Updated 5 months ago
- ☆20May 1, 2025Updated 11 months ago
- An annotation tool for grounding of formulae☆24May 28, 2024Updated last year
- The Science knowledge graph ontologies, a.k.a. SKGO, is a suite of OWL ontology models to capture the knowledge of scientific research da…☆16Jul 3, 2025Updated 9 months ago
- ☆31Apr 10, 2023Updated 3 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- A script to generate tagged XML Citationstrings for citation parsing☆20Apr 17, 2020Updated 5 years ago
- Scholarly Big Data Subject Category Classifier☆10Jul 15, 2019Updated 6 years ago
- This is a pip package implementing Reinforcement Learning algorithms in non-stationary environments supported by the OpenAI Gym toolkit.☆16Jun 28, 2024Updated last year
- ☆127Feb 24, 2026Updated last month
- ☆34Jan 2, 2024Updated 2 years ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆180Mar 18, 2023Updated 3 years ago
- ☆18Aug 21, 2025Updated 7 months ago
- ☆26Apr 4, 2026Updated last week
- Aligned Neural Topic Model (ANTM) for Exploring Evolving Topics: a dynamic neural topic model that uses document embeddings (data2vec) to…☆37Nov 6, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆11Apr 15, 2022Updated 3 years ago
- CascadER: Cross-Modal Cascading for Knowledge Graph Link Prediction (arXiv 22)☆13Jun 17, 2022Updated 3 years ago
- A recurrent neural network model to analyze how travelers expressed their feelings on Twitter☆12Jun 30, 2019Updated 6 years ago
- multimodal document analysis☆165Feb 28, 2026Updated last month
- Accelerating GOT-OCRv2 with VLLM☆10Nov 15, 2024Updated last year
- The Python Digital Toolbox contains examples of how to solve various data analysis problems using Python libraries.☆15Mar 30, 2026Updated 2 weeks ago
- Neural Solr = Solr 9 + Mighty Inference + Node☆18Jun 9, 2022Updated 3 years ago
- A minimalistic deep learning framework resembling PyTorch API.☆17Jan 5, 2026Updated 3 months ago
- ☆18Dec 25, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Flask API for generating text embeddings using OpenAI or sentence_transformers☆14Sep 1, 2023Updated 2 years ago
- Code for ICML 2025 paper | Joint Localization and Activation Editing for Low-Resource Fine-Tuning☆28Jun 18, 2025Updated 9 months ago
- This is the code for reproducing the TABBIE baseline in our paper: "Retrieval-Based Transformer for Table Augmentation"☆12Sep 17, 2025Updated 6 months ago
- Interactive Data Augmentation (CHI 2025)☆32Mar 20, 2025Updated last year
- https://weeklykagglenews.substack.com☆24Dec 31, 2022Updated 3 years ago
- Datably.ai☆17Jun 17, 2025Updated 9 months ago
- SciCap Dataset☆58Nov 5, 2021Updated 4 years ago
- ☆15Mar 27, 2026Updated 2 weeks ago
- ☆11Aug 8, 2025Updated 8 months ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Code for EMNLP'20 paper "When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models"☆11Nov 10, 2020Updated 5 years ago
- Self-Service Semantic Suite (S4)☆18Sep 29, 2016Updated 9 years ago
- An ontology containing biotic and abiotic plant stresses. Part of the Planteome suite of reference ontologies. Formerly called the Onto…☆18Updated this week
- The Airplane Ticket Booking and Management System is a modern web application designed to streamline the process of booking and managing …☆24Nov 6, 2024Updated last year
- GloSAT Historical Measurement Table Dataset☆11Dec 3, 2025Updated 4 months ago
- This repository hosts the dataset for the paper Computer Science Named Entity Recognition in the Open Research Knowledge Graph☆22Jan 8, 2024Updated 2 years ago
- A Lucene Indexer for XML, with lexical analysis (lemmatization for French)☆18Updated this week