titipata / scipdf_parser
Python PDF parser for scientific publications: content and figures
☆328Updated 5 months ago
Related projects: ⓘ
- Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)☆332Updated 5 months ago
- Python client for GROBID Web services☆279Updated 3 weeks ago
- Tools to scrape publication metadata from pubmed, arxiv, medrxiv and chemrxiv.☆211Updated 2 months ago
- S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/☆802Updated 4 months ago
- Unofficial Python client library for Semantic Scholar APIs.☆287Updated 2 months ago
- Get answers to research questions from 200M+ papers. Link to demo -☆203Updated 8 months ago
- A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network☆256Updated 11 months ago
- A proof of concept to scrape papers from journals☆225Updated 3 months ago
- Given a scholarly PDF, extract figures, tables, captions, and section titles.☆596Updated 6 months ago
- SPECTER: Document-level Representation Learning using Citation-informed Transformers☆508Updated last year
- Science-parse version 2☆228Updated 4 years ago
- Software that makes labeling PDFs easy.☆380Updated 4 months ago
- Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.☆166Updated last week
- Incorporating distribution of experts in order to better predict the future discovery of novel scientific connections☆22Updated 10 months ago
- Science Parse parses scientific papers (in PDF form) and returns them in structured form.☆616Updated 3 months ago
- multimodal document analysis☆159Updated 3 months ago
- A web scraping tool to systematically extract the text of scientific papers and corresponding metadata from university accessible journal…☆183Updated last year
- library supporting NLP and CV research on scientific papers☆669Updated 5 months ago
- A Python library for OpenAlex (openalex.org)☆142Updated last week
- 📄 ⚙️ ETL processes for medical and scientific papers☆342Updated 9 months ago
- SciRepEval benchmark training and evaluation scripts☆67Updated 4 months ago
- A set of scripts to grab public datasets from resources related to arXiv☆399Updated 3 months ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆166Updated last year
- https://doi.org/10.1093/bioinformatics/btz228☆38Updated last year
- [ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links☆414Updated 2 years ago
- A toolkit for automatically extracting semantic information from PDF files of scientific articles☆61Updated 8 months ago
- SpanMarker for Named Entity Recognition☆384Updated last month
- Explore and interpret large embeddings in your browser with interactive visualization! 📍☆401Updated 7 months ago
- Dataset accompanying the SPECTER model☆127Updated last year
- A python library that implements the Crossref API.☆265Updated 2 months ago