allenai/s2orc

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/allenai/s2orc)

allenai / s2orc

S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/

☆1,075

Alternatives and similar repositories for s2orc

Users that are interested in s2orc are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

allenai / s2orc-doc2json
View on GitHub
Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)
☆473Apr 11, 2024Updated 2 years ago
allenai / scicite
View on GitHub
Repository for NAACL 2019 paper on Citation Intent prediction
☆130Dec 1, 2019Updated 6 years ago
allenai / scidocs
View on GitHub
Dataset accompanying the SPECTER model
☆148Dec 19, 2022Updated 3 years ago
allenai / science-parse
View on GitHub
Science Parse parses scientific papers (in PDF form) and returns them in structured form.
☆702May 26, 2024Updated 2 years ago
allenai / specter
View on GitHub
SPECTER: Document-level Representation Learning using Citation-informed Transformers
☆586Jun 12, 2023Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
allenai / scibert
View on GitHub
A BERT model for scientific text.
☆1,705Feb 22, 2022Updated 4 years ago
allenai / SciREX
View on GitHub
Data/Code Repository for https://api.semanticscholar.org/CorpusID:218470122
☆140Jul 25, 2024Updated 2 years ago
allenai / spv2
View on GitHub
Science-parse version 2
☆257Nov 20, 2019Updated 6 years ago
grobidOrg / grobid
View on GitHub
A machine learning software for extracting information from scholarly documents
☆5,022Updated this week
grobidOrg / grobid-client-python
View on GitHub
Python client for GROBID Web services
☆410Updated this week
kermitt2 / article_dataset_builder
View on GitHub
Open Access PDF harvester, metadata aggregator and full-text ingester
☆62May 3, 2024Updated 2 years ago
IllDepence / unarXive
View on GitHub
A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network
☆305Sep 28, 2024Updated last year
malteos / scincl
View on GitHub
Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings (EMNLP 2022 paper)
☆79Dec 29, 2025Updated 6 months ago
allenai / peS2o
View on GitHub
Pretraining Efficiently on S2ORC!
☆187Oct 23, 2024Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
allenai / mup
View on GitHub
☆18Oct 22, 2022Updated 3 years ago
copenlu / cite-worth
View on GitHub
Data and code for the paper "CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding"
☆14Sep 8, 2022Updated 3 years ago
kermitt2 / biblio_glutton_harvester
View on GitHub
Open Access PDF harvester
☆42May 3, 2024Updated 2 years ago
allenai / scispacy
View on GitHub
A full spaCy pipeline and models for scientific/biomedical documents.
☆1,977Dec 4, 2025Updated 7 months ago
allenai / scifact
View on GitHub
Data and models for the SciFact verification task.
☆267Oct 15, 2023Updated 2 years ago
allenai / papermage
View on GitHub
library supporting NLP and CV research on scientific papers
☆800Nov 8, 2024Updated last year
kermitt2 / biblio-glutton
View on GitHub
A high performance bibliographic information service: https://biblio-glutton.readthedocs.io
☆150Apr 8, 2026Updated 3 months ago
allenai / paper-embedding-public-apis
View on GitHub
Collection of public APIs for embedding scientific papers
☆59Feb 19, 2021Updated 5 years ago
danielnsilva / semanticscholar
View on GitHub
Unofficial Python client library for Semantic Scholar APIs.
☆475Jul 3, 2026Updated 3 weeks ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
greenelab / opencitations
View on GitHub
Processing OpenCitations Data
☆20Aug 17, 2017Updated 8 years ago
viswavi / CitationIE
View on GitHub
☆30Jun 11, 2021Updated 5 years ago
allenai / scirepeval
View on GitHub
SciRepEval benchmark training and evaluation scripts
☆89May 5, 2026Updated 2 months ago
allenai / multicite
View on GitHub
MultiCite code and data. Models are available on Huggingface.
☆37May 10, 2022Updated 4 years ago
DataSeer / dataseer-ml
View on GitHub
DataSeer machine-learning service
☆28Sep 4, 2025Updated 10 months ago
allenai / S2AND
View on GitHub
Semantic Scholar's Author Disambiguation Algorithm & Evaluation Suite
☆111Updated this week
allenai / ForeCite
View on GitHub
☆35Sep 16, 2022Updated 3 years ago
kermitt2 / biblio-glutton-extension
View on GitHub
A browser extension providing Open Access bibliographical services
☆18Dec 9, 2022Updated 3 years ago
napsternxg / awesome-scholarly-data-analysis
View on GitHub
A curated collection of resources on scholarly data analysis ranging from datasets, papers, and code about bibliometrics, citation analys…
☆204Jul 30, 2025Updated 11 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
kermitt2 / arxiv_harvester
View on GitHub
Poor man's simple harvester for arXiv resources
☆14Jul 14, 2023Updated 3 years ago
kermitt2 / Pub2TEI
View on GitHub
Service for converting and enhancing heterogeneous publisher XML formats into TEI
☆65Apr 12, 2026Updated 3 months ago
titipata / pubmed_parser
View on GitHub
A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset
☆734Jul 31, 2025Updated 11 months ago
kermitt2 / entity-fishing
View on GitHub
A machine learning tool for fishing entities
☆268Feb 27, 2026Updated 4 months ago
davidjurgens / citation-function
View on GitHub
Measuring the Evolution of a Scientific Field through Citation Frames
☆63Oct 5, 2018Updated 7 years ago
allenai / s2-folks
View on GitHub
Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.
☆279Jan 24, 2025Updated last year
softcite / softcite_kb
View on GitHub
A Knowledge Base for research software relying on large-scale text mining and curated knowledge sources
☆18May 14, 2023Updated 3 years ago