allenai/pdffigures2

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/allenai/pdffigures2)

allenai / pdffigures2

Given a scholarly PDF, extract figures, tables, captions, and section titles.

☆750

Alternatives and similar repositories for pdffigures2

Users that are interested in pdffigures2 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

allenai / deepfigures-open
View on GitHub
Companion code to the paper "Extracting Scientific Figures with Distantly Supervised Neural Networks" 🤖
☆148Jun 14, 2022Updated 4 years ago
allenai / science-parse
View on GitHub
Science Parse parses scientific papers (in PDF form) and returns them in structured form.
☆702May 26, 2024Updated 2 years ago
allenai / spv2
View on GitHub
Science-parse version 2
☆257Nov 20, 2019Updated 6 years ago
grobidOrg / grobid
View on GitHub
A machine learning software for extracting information from scholarly documents
☆5,022Updated this week
titipata / scipdf_parser
View on GitHub
Python PDF parser for scientific publications: content and figures
☆455Mar 21, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
allenai / figureseer
View on GitHub
☆41May 15, 2020Updated 6 years ago
greenelab / opencitations
View on GitHub
Processing OpenCitations Data
☆20Aug 17, 2017Updated 8 years ago
apple2373 / figure-separator
View on GitHub
compound figure separation tool
☆22Jun 13, 2024Updated 2 years ago
MuiseDestiny / zotero-figure
View on GitHub
PDF图、表、公式一网打尽——Zotero插件。
☆546Updated this week
SeerLabs / pdfmef
View on GitHub
Multi-Entity Extraction Framework for Academic Documents (with default extraction tools)
☆31Oct 3, 2023Updated 2 years ago
grobidOrg / grobid-client-python
View on GitHub
Python client for GROBID Web services
☆410Updated this week
allenai / s2orc
View on GitHub
S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/
☆1,075Apr 26, 2024Updated 2 years ago
allenai / scibert
View on GitHub
A BERT model for scientific text.
☆1,705Feb 22, 2022Updated 4 years ago
allenai / s2orc-doc2json
View on GitHub
Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)
☆473Apr 11, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ibm-aur-nlp / PubLayNet
View on GitHub
☆1,053Jul 9, 2025Updated last year
allenai / specter
View on GitHub
SPECTER: Document-level Representation Learning using Citation-informed Transformers
☆586Jun 12, 2023Updated 3 years ago
greenelab / crossref
View on GitHub
Download metadata for all DOIs using the Crossref API
☆66Sep 25, 2018Updated 7 years ago
CeON / CERMINE
View on GitHub
Content ExtRactor and MINEr
☆512Jun 30, 2022Updated 4 years ago
kermitt2 / biblio-glutton
View on GitHub
A high performance bibliographic information service: https://biblio-glutton.readthedocs.io
☆150Apr 8, 2026Updated 3 months ago
IBM / science-result-extractor
View on GitHub
☆100May 20, 2022Updated 4 years ago
kermitt2 / pdfalto
View on GitHub
PDF to XML ALTO file converter
☆272Updated this week
allenai / papermage
View on GitHub
library supporting NLP and CV research on scientific papers
☆800Nov 8, 2024Updated last year
facebookresearch / nougat
View on GitHub
Implementation of Nougat Neural Optical Understanding for Academic Documents
☆10,050Feb 21, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
softcite / software-mentions
View on GitHub
Softcite software mention recognizer, finding mentions and citations to software from within the academic literature
☆85Jun 6, 2026Updated last month
allenai / paper-embedding-public-apis
View on GitHub
Collection of public APIs for embedding scientific papers
☆59Feb 19, 2021Updated 5 years ago
KMCS-NII / AASC
View on GitHub
AASC: ACL Anthology Sentence Corpus
☆20Oct 28, 2020Updated 5 years ago
allenai / vila
View on GitHub
Incorporating VIsual LAyout Structures for Scientific Text Classification
☆180Mar 18, 2023Updated 3 years ago
ropensci-archive / alm
View on GitHub
ARCHIVED R Client for the Lagotto Altmetrics Platform
☆15May 10, 2022Updated 4 years ago
kermitt2 / article_dataset_builder
View on GitHub
Open Access PDF harvester, metadata aggregator and full-text ingester
☆62May 3, 2024Updated 2 years ago
windingwind / zotero-plugin-template
View on GitHub
A plugin template for Zotero.
☆835Mar 9, 2026Updated 4 months ago
retorquere / zotero-date-from-last-modified
View on GitHub
☆122Jun 24, 2026Updated last month
allenai / scidocs
View on GitHub
Dataset accompanying the SPECTER model
☆148Dec 19, 2022Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
franzbischoff / zotero-pdf-metadata
View on GitHub
WIP - Updates the User PDF file metadata when title or author is changed
☆22Apr 22, 2024Updated 2 years ago
BobLd / DocumentLayoutAnalysis
View on GitHub
Document Layout Analysis resources repos for development with PdfPig.
☆637Oct 1, 2023Updated 2 years ago
zotero / make-it-red
View on GitHub
Sample plugin for Zotero 7
☆83Jan 8, 2025Updated last year
titipata / pubmed_parser
View on GitHub
A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset
☆734Jul 31, 2025Updated 11 months ago
copenlu / cite-worth
View on GitHub
Data and code for the paper "CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding"
☆14Sep 8, 2022Updated 3 years ago
ibm-aur-nlp / PubTabNet
View on GitHub
☆484Jul 8, 2025Updated last year
Acemap / pdf_parser
View on GitHub
All in one PDF Parser Toolkit
☆17Sep 15, 2023Updated 2 years ago