allenai / s2orc
S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/
☆900Updated 11 months ago
Alternatives and similar repositories for s2orc:
Users that are interested in s2orc are comparing it to the libraries listed below
- Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)☆360Updated 11 months ago
- SPECTER: Document-level Representation Learning using Citation-informed Transformers☆539Updated last year
- A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network☆285Updated 6 months ago
- Science Parse parses scientific papers (in PDF form) and returns them in structured form.☆653Updated 10 months ago
- Python PDF parser for scientific publications: content and figures☆400Updated last year
- Science-parse version 2☆241Updated 5 years ago
- Python client for GROBID Web services☆318Updated last month
- A set of scripts to grab public datasets from resources related to arXiv☆436Updated 10 months ago
- A BERT model for scientific text.☆1,574Updated 3 years ago
- Unofficial Python client library for Semantic Scholar APIs.☆360Updated last month
- Data and models for the SciFact verification task.☆227Updated last year
- Dataset accompanying the SPECTER model☆133Updated 2 years ago
- Given a scholarly PDF, extract figures, tables, captions, and section titles.☆649Updated last year
- Tools for curating biomedical training data for large-scale language modeling☆474Updated 3 months ago
- [ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links☆434Updated 2 years ago
- A full spaCy pipeline and models for scientific/biomedical documents.☆1,781Updated 4 months ago
- [EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627☆478Updated 5 months ago
- Library for Knowledge Intensive Language Tasks☆936Updated 3 years ago
- Data and software for building the ACL Anthology.☆502Updated this week
- The full dataset behind paperswithcode.com☆340Updated 3 years ago
- Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.☆221Updated 2 months ago
- Autoregressive Entity Retrieval☆783Updated last year
- Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.☆1,784Updated this week
- Data and Code for ICLR2020 Paper "TabFact: A Large-scale Dataset for Table-based Fact Verification"☆393Updated last year
- Content ExtRactor and MINEr☆493Updated 2 years ago
- library supporting NLP and CV research on scientific papers☆754Updated 4 months ago
- Resources for the "SummEval: Re-evaluating Summarization Evaluation" paper☆391Updated 9 months ago
- A corpus of Biomedical papers annotated with mentions of UMLS entities.☆324Updated 3 years ago
- Software that makes labeling PDFs easy.☆409Updated 10 months ago
- ☆86Updated 10 months ago