ckorzen / pdf-text-extraction-benchmarkView external linksLinks
A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF documents, especially from scientific articles.
☆69Nov 7, 2020Updated 5 years ago
Alternatives and similar repositories for pdf-text-extraction-benchmark
Users that are interested in pdf-text-extraction-benchmark are comparing it to the libraries listed below
Sorting:
- The repository of Icecite, a research paper management system.☆15Mar 29, 2018Updated 7 years ago
- A basic tool that extracts the structure from the PDF files of scientific articles.☆76Jan 4, 2022Updated 4 years ago
- Open Access PDF harvester☆42May 3, 2024Updated last year
- Reference implementation of algorithms for reinforcement learning and Markov decision processes.☆12Jan 28, 2021Updated 5 years ago
- Material parsers and other tools, scripts Initially developed for Grobid Superconductor☆13Feb 21, 2025Updated 11 months ago
- A library for parsing security advisories☆13Feb 5, 2026Updated last week
- Benchmark dataset for the evaluation of scientific article representations on the task of citation recommendation across various scientif…☆12Oct 21, 2022Updated 3 years ago
- Softcite software mention recognizer, finding mentions and citations to software from within the academic literature☆81Sep 30, 2025Updated 4 months ago
- Project archived. See: https://github.com/plus3it/gravitybee/issues/486#issuecomment-1414501191☆18Feb 13, 2023Updated 3 years ago
- S2APLER: S2 Agglomeration of Papers with Low Error Rate (it's for academic paper clustering)☆21Nov 4, 2025Updated 3 months ago
- A Django project to help users to create free, fast and secure blogs on GitHub Pages and Jekyll.☆21Dec 8, 2022Updated 3 years ago
- X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents (JCDL 2022)☆14Jul 22, 2022Updated 3 years ago
- Python wrapper for xpdf☆19Nov 28, 2019Updated 6 years ago
- Functional and structural analysis of tables in research papers (Table disentangling)☆20Aug 7, 2017Updated 8 years ago
- Science Parse parses scientific papers (in PDF form) and returns them in structured form.☆692May 26, 2024Updated last year
- ☆17Feb 1, 2023Updated 3 years ago
- Grobid module for superconductor material and properties extraction☆22May 17, 2025Updated 8 months ago
- Low-effort reachability analysis for third-party code vulnerabilities.☆22Jul 11, 2023Updated 2 years ago
- A Test Collection of Computer Science Papers for Faceted Query by Example☆22Nov 28, 2021Updated 4 years ago
- A Python dependency resolver☆25Dec 22, 2025Updated last month
- A solver for package problems in CUDF format☆27Sep 29, 2025Updated 4 months ago
- An index data structure for approximate string search.☆23May 6, 2019Updated 6 years ago
- Scripts as a service. Builds on systemd (for Linux)☆21Jun 18, 2023Updated 2 years ago
- DataSeer machine-learning service☆28Sep 4, 2025Updated 5 months ago
- Audio Book scrapper☆28Mar 27, 2024Updated last year
- Open Access PDF harvester, metadata aggregator and full-text ingester☆62May 3, 2024Updated last year
- Code for ECIR 2022 paper Local Citation Recommendation with Hierarchical-Attention Text Encoder and SciBERT-based Reranking☆25Jul 30, 2024Updated last year
- My collection of miscellaneous source code☆34Aug 31, 2025Updated 5 months ago
- A machine learning software for extracting information from scholarly documents☆4,630Feb 6, 2026Updated last week
- ☆35Sep 16, 2022Updated 3 years ago
- Science-parse version 2☆253Nov 20, 2019Updated 6 years ago
- An online rss reader written in clojure & javascript & java.☆148May 13, 2013Updated 12 years ago
- Audit python packages for known vulnerabilities☆34Mar 9, 2022Updated 3 years ago
- Maxwell Forbes & Yejin Choi — ACL 2017☆25Dec 23, 2021Updated 4 years ago
- An open-source CRF Reference String Parsing Package☆160May 6, 2020Updated 5 years ago
- a large scientific paraphrase dataset for longer paraphrase generation☆39Oct 17, 2022Updated 3 years ago
- A text parser.☆31Apr 16, 2022Updated 3 years ago
- Companion code to the paper "Extracting Scientific Figures with Distantly Supervised Neural Networks" 🤖☆146Jun 14, 2022Updated 3 years ago
- LongSumm - Scientific Document Summarization Task☆74Jun 30, 2022Updated 3 years ago