A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF documents, especially from scientific articles.
☆71Nov 7, 2020Updated 5 years ago
Alternatives and similar repositories for pdf-text-extraction-benchmark
Users that are interested in pdf-text-extraction-benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- AI Assistance for Writing Scientific Alt Text☆14Feb 7, 2024Updated 2 years ago
- The repository of Icecite, a research paper management system.☆15Mar 29, 2018Updated 8 years ago
- A basic tool that extracts the structure from the PDF files of scientific articles.☆76Jan 4, 2022Updated 4 years ago
- PDF Extraction Toolkit☆43Nov 23, 2020Updated 5 years ago
- Service to scan licenses from source code☆12Aug 14, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- PDF article title extraction tool☆13Oct 9, 2015Updated 10 years ago
- A library for parsing security advisories☆13Apr 13, 2026Updated 3 weeks ago
- Open Access PDF harvester☆42May 3, 2024Updated 2 years ago
- Material parsers and other tools, scripts Initially developed for Grobid Superconductor☆13Feb 21, 2025Updated last year
- Softcite software mention recognizer, finding mentions and citations to software from within the academic literature☆84Apr 16, 2026Updated 2 weeks ago
- ☆12Mar 24, 2021Updated 5 years ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆180Mar 18, 2023Updated 3 years ago
- Finding mentions and citations to named and implicit research datasets from within the academic literature☆30Jun 14, 2025Updated 10 months ago
- Functional and structural analysis of tables in research papers (Table disentangling)☆21Aug 7, 2017Updated 8 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Reference implementation of algorithms for reinforcement learning and Markov decision processes.☆12Jan 28, 2021Updated 5 years ago
- A pure python rpm reader☆20Apr 11, 2024Updated 2 years ago
- X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents (JCDL 2022)☆14Jul 22, 2022Updated 3 years ago
- Project archived. See: https://github.com/plus3it/gravitybee/issues/486#issuecomment-1414501191☆18Feb 13, 2023Updated 3 years ago
- This Python project develops a LDA model which trains on various Wikipedia articles based on a keyword and then suggests Wikipedia articl…☆10Oct 22, 2019Updated 6 years ago
- ☆18Feb 1, 2023Updated 3 years ago
- An index data structure for approximate string search.☆23May 6, 2019Updated 7 years ago
- Scripts as a service. Builds on systemd (for Linux)☆21Mar 10, 2026Updated last month
- Benchmark dataset for the evaluation of scientific article representations on the task of citation recommendation across various scientif…☆12Oct 21, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- utilities for filesystem exploration and automated builds☆21Apr 7, 2026Updated 3 weeks ago
- DataSeer machine-learning service☆28Sep 4, 2025Updated 8 months ago
- Science Parse parses scientific papers (in PDF form) and returns them in structured form.☆699May 26, 2024Updated last year
- ☆41Feb 25, 2018Updated 8 years ago
- Generate changelogs from commit tags and shortlogs☆28Nov 2, 2025Updated 6 months ago
- Data structures and code to read/write BioC XML and Json.☆33Aug 21, 2023Updated 2 years ago
- Superconductors material dataset☆27Dec 5, 2023Updated 2 years ago
- ☆98May 20, 2022Updated 3 years ago
- Repository for NAACL 2019 paper on Citation Intent prediction☆129Dec 1, 2019Updated 6 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Converter from UD-trees to BART representation☆35Mar 6, 2024Updated 2 years ago
- Scripts for going from raw .fastq files to processed and quality-checked .bam files for downstream analysis☆14Nov 23, 2021Updated 4 years ago
- XWikisCorpus, cross-lingual summarisation, multi-lingual summarisation, pre-trained language models, zero-shot and few-shot summarisation…☆10Nov 4, 2022Updated 3 years ago
- Science-parse version 2☆257Nov 20, 2019Updated 6 years ago
- S2APLER: S2 Agglomeration of Papers with Low Error Rate (it's for academic paper clustering)☆21Updated this week
- Grobid module for superconductor material and properties extraction☆22May 17, 2025Updated 11 months ago
- The Semantic Scholar Search Reranker☆113Oct 26, 2020Updated 5 years ago