A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network
☆300Sep 28, 2024Updated last year
Alternatives and similar repositories for unarXive
Users that are interested in unarXive are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Repository for NAACL 2019 paper on Citation Intent prediction☆129Dec 1, 2019Updated 6 years ago
- S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/☆1,057Apr 26, 2024Updated 2 years ago
- Dataset and model in the paper "SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation"☆13Feb 14, 2022Updated 4 years ago
- Official dataset repository for "SciReviewGen: A Large-scale Dataset for Automatic Literature Review Generation."☆21Jun 4, 2023Updated 2 years ago
- This repository provides details and links to the ACL anthology corpus/collection including .bib, .pdf and grobid extractions of the pdfs☆192Oct 12, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Pretraining Efficiently on S2ORC!☆183Oct 23, 2024Updated last year
- Measuring the Evolution of a Scientific Field through Citation Frames☆63Oct 5, 2018Updated 7 years ago
- ☆20Feb 17, 2024Updated 2 years ago
- AASC: ACL Anthology Sentence Corpus☆20Oct 28, 2020Updated 5 years ago
- SPECTER: Document-level Representation Learning using Citation-informed Transformers☆579Jun 12, 2023Updated 2 years ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆66Jul 8, 2024Updated last year
- ☆15Jul 9, 2025Updated 10 months ago
- Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)☆466Apr 11, 2024Updated 2 years ago
- ☆20Jan 18, 2022Updated 4 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Data/Code Repository for https://api.semanticscholar.org/CorpusID:218470122☆140Jul 25, 2024Updated last year
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆180Mar 18, 2023Updated 3 years ago
- ☆21Jan 15, 2024Updated 2 years ago
- code for generating a high-quality knowledge graph with metadata about datasets and links to publications☆28Apr 8, 2022Updated 4 years ago
- ScienceMeter: Tracking Scientific Knowledge Updates in Language Models☆17Jun 28, 2025Updated 10 months ago
- Finding mentions and citations to named and implicit research datasets from within the academic literature☆30Jun 14, 2025Updated 11 months ago
- Data and code for the paper "CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding"☆14Sep 8, 2022Updated 3 years ago
- Science Parse parses scientific papers (in PDF form) and returns them in structured form.☆699May 26, 2024Updated 2 years ago
- Aligned, Review-Informed Edits of Scientific Papers☆55Jul 5, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- True Few-Shot BioIE: Benchmarking GPT-3 In-Context and Small PLM Fine-Tuning☆12Jul 6, 2022Updated 3 years ago
- Code for Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals.☆17Apr 25, 2021Updated 5 years ago
- INFO 5613 Network Science☆22Oct 28, 2021Updated 4 years ago
- link raw affiliation to ROR ids☆31Mar 3, 2026Updated 2 months ago
- The project page for "SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim Verification on Scientific Tables"☆23Dec 21, 2023Updated 2 years ago
- Web archiving utility library☆11May 5, 2026Updated 3 weeks ago
- Dataset accompanying the SPECTER model☆146Dec 19, 2022Updated 3 years ago
- ☆12Jan 20, 2025Updated last year
- Neuralized version of the Reference String Parser component of the ParsCit package.☆81May 27, 2022Updated 4 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- MultiCite code and data. Models are available on Huggingface.☆34May 10, 2022Updated 4 years ago
- A dataset of fine-grained knowledge graphs of scientific claims☆16Sep 24, 2021Updated 4 years ago
- Data and Code for EMNLP 2022 paper "ReasTAP: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples"☆15Jun 4, 2023Updated 2 years ago
- The guts for computing data for OpenAlex. For more, see https://openalex.org/.