allenai / pdffigures2Links
Given a scholarly PDF, extract figures, tables, captions, and section titles.
β661Updated last year
Alternatives and similar repositories for pdffigures2
Users that are interested in pdffigures2 are comparing it to the libraries listed below
Sorting:
- Python PDF parser for scientific publications: content and figuresβ405Updated last year
- Companion code to the paper "Extracting Scientific Figures with Distantly Supervised Neural Networks" π€β141Updated 2 years ago
- Science Parse parses scientific papers (in PDF form) and returns them in structured form.β659Updated last year
- Science-parse version 2β244Updated 5 years ago
- Python client for GROBID Web servicesβ330Updated 3 months ago
- Command line tool to extract figures, tables, and captions from scholarly documents in PDF form.β130Updated 7 years ago
- S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/β938Updated last year
- Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)β407Updated last year
- Content ExtRactor and MINErβ494Updated 2 years ago
- PDF to XML ALTO file converterβ240Updated last week
- Unofficial Python client library for Semantic Scholar APIs.β371Updated 3 months ago
- A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation networkβ289Updated 8 months ago
- Incorporating VIsual LAyout Structures for Scientific Text Classificationβ177Updated 2 years ago
- SPECTER: Document-level Representation Learning using Citation-informed Transformersβ551Updated last year
- library supporting NLP and CV research on scientific papersβ772Updated 6 months ago
- Software that makes labeling PDFs easy.β416Updated last year
- A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF β¦β67Updated 4 years ago
- A set of scripts to grab public datasets from resources related to arXivβ446Updated last year
- DocBank: A Benchmark Dataset for Document Layout Analysisβ611Updated 9 months ago
- A machine learning software for extracting information from scholarly documentsβ4,068Updated this week
- Pyzotero: a Python client for the Zotero APIβ1,035Updated last week
- Neuralized version of the Reference String Parser component of the ParsCit package.β81Updated 3 years ago
- GROBID extension for identifying and normalizing physical quantities.β82Updated 2 weeks ago
- https://doi.org/10.1093/bioinformatics/btz228β39Updated 6 months ago
- β40Updated 5 years ago
- A BERT model for scientific text.β1,603Updated 3 years ago
- A curated collection of resources on scholarly data analysis ranging from datasets, papers, and code about bibliometrics, citation analysβ¦β186Updated 3 months ago
- Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.β228Updated 4 months ago
- A web scraping tool to systematically extract the text of scientific papers and corresponding metadata from university accessible journalβ¦β202Updated 2 years ago
- A high performance bibliographic information service: https://biblio-glutton.readthedocs.ioβ139Updated 8 months ago