A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF documents, especially from scientific articles.
☆70Nov 7, 2020Updated 5 years ago
Alternatives and similar repositories for pdf-text-extraction-benchmark
Users that are interested in pdf-text-extraction-benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A basic tool that extracts the structure from the PDF files of scientific articles.☆76Jan 4, 2022Updated 4 years ago
- A Knowledge Base for research software relying on large-scale text mining and curated knowledge sources☆17May 14, 2023Updated 2 years ago
- PDF Extraction Toolkit☆43Nov 23, 2020Updated 5 years ago
- Multi-Entity Extraction Framework for Academic Documents (with default extraction tools)☆31Oct 3, 2023Updated 2 years ago
- PDF article title extraction tool☆13Oct 9, 2015Updated 10 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Data and code for the paper "CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding"☆14Sep 8, 2022Updated 3 years ago
- A library for parsing security advisories☆13Feb 5, 2026Updated 2 months ago
- Open Access PDF harvester☆42May 3, 2024Updated last year
- Packaging Metadata Comparions☆18Apr 3, 2020Updated 6 years ago
- Finding mentions and citations to named and implicit research datasets from within the academic literature☆30Jun 14, 2025Updated 10 months ago
- Functional and structural analysis of tables in research papers (Table disentangling)☆20Aug 7, 2017Updated 8 years ago
- Reference implementation of algorithms for reinforcement learning and Markov decision processes.☆12Jan 28, 2021Updated 5 years ago
- X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents (JCDL 2022)☆14Jul 22, 2022Updated 3 years ago
- Project archived. See: https://github.com/plus3it/gravitybee/issues/486#issuecomment-1414501191☆18Feb 13, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Large-scale dataset of patent drawings and image retrieval baseline.☆41Jul 5, 2022Updated 3 years ago
- A browser extension providing Open Access bibliographical services☆18Dec 9, 2022Updated 3 years ago
- ☆18Feb 1, 2023Updated 3 years ago
- A Django project to help users to create free, fast and secure blogs on GitHub Pages and Jekyll.☆21Dec 8, 2022Updated 3 years ago
- An index data structure for approximate string search.☆23May 6, 2019Updated 6 years ago
- An open-source CRF Reference String Parsing Package☆161May 6, 2020Updated 5 years ago
- Molar is a database management to make it easy to store experiment whether computational or not☆11Jul 15, 2022Updated 3 years ago
- Scripts as a service. Builds on systemd (for Linux)☆21Mar 10, 2026Updated last month
- Benchmark dataset for the evaluation of scientific article representations on the task of citation recommendation across various scientif…☆12Oct 21, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Low-effort reachability analysis for third-party code vulnerabilities.☆22Jul 11, 2023Updated 2 years ago
- DataSeer machine-learning service☆28Sep 4, 2025Updated 7 months ago
- Open-source stochastic GW software☆13Apr 28, 2025Updated 11 months ago
- Identifying Used Methods and Datasets in Scientific Publications☆18Jan 14, 2021Updated 5 years ago
- Script to help maintain a wheelhouse folder on a cloud storage.☆33Aug 4, 2020Updated 5 years ago
- Open Access PDF harvester, metadata aggregator and full-text ingester☆62May 3, 2024Updated last year
- ☆98May 20, 2022Updated 3 years ago
- A gold-standard dataset of software mentions in research publications.☆38Jul 27, 2023Updated 2 years ago
- Repository for NAACL 2019 paper on Citation Intent prediction☆130Dec 1, 2019Updated 6 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Converter from UD-trees to BART representation☆35Mar 6, 2024Updated 2 years ago
- Javascript based component for highlighting text-mined annotations of different semantic types in a full text article identified by a PMC…☆11Nov 29, 2016Updated 9 years ago
- Python bindings for the Unitex/GramLab corpus processor☆10Nov 25, 2022Updated 3 years ago
- Science-parse version 2☆255Nov 20, 2019Updated 6 years ago
- Grobid module for superconductor material and properties extraction☆22May 17, 2025Updated 10 months ago
- Audit python packages for known vulnerabilities☆34Mar 9, 2022Updated 4 years ago
- A machine learning software for extracting information from scholarly documents☆4,776Apr 9, 2026Updated last week