A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network
☆298Sep 28, 2024Updated last year
Alternatives and similar repositories for unarXive
Users that are interested in unarXive are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Repository for NAACL 2019 paper on Citation Intent prediction☆130Dec 1, 2019Updated 6 years ago
- S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/☆1,038Apr 26, 2024Updated last year
- Dataset and model in the paper "SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation"☆13Feb 14, 2022Updated 4 years ago
- Official dataset repository for "SciReviewGen: A Large-scale Dataset for Automatic Literature Review Generation."☆19Jun 4, 2023Updated 2 years ago
- This repository provides details and links to the ACL anthology corpus/collection including .bib, .pdf and grobid extractions of the pdfs☆191Oct 12, 2023Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Pretraining Efficiently on S2ORC!☆183Oct 23, 2024Updated last year
- Measuring the Evolution of a Scientific Field through Citation Frames☆63Oct 5, 2018Updated 7 years ago
- ☆20Feb 17, 2024Updated 2 years ago
- AASC: ACL Anthology Sentence Corpus☆20Oct 28, 2020Updated 5 years ago
- SPECTER: Document-level Representation Learning using Citation-informed Transformers☆575Jun 12, 2023Updated 2 years ago
- ☆15Jul 9, 2025Updated 9 months ago
- Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)☆463Apr 11, 2024Updated 2 years ago
- ☆20Jan 18, 2022Updated 4 years ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆180Mar 18, 2023Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆21Jan 15, 2024Updated 2 years ago
- code for generating a high-quality knowledge graph with metadata about datasets and links to publications☆28Apr 8, 2022Updated 4 years ago
- Finding mentions and citations to named and implicit research datasets from within the academic literature☆30Jun 14, 2025Updated 10 months ago
- Data and code for the paper "CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding"☆14Sep 8, 2022Updated 3 years ago
- ☆53Feb 21, 2025Updated last year
- Science Parse parses scientific papers (in PDF form) and returns them in structured form.☆698May 26, 2024Updated last year
- True Few-Shot BioIE: Benchmarking GPT-3 In-Context and Small PLM Fine-Tuning☆12Jul 6, 2022Updated 3 years ago
- INFO 5613 Network Science☆22Oct 28, 2021Updated 4 years ago
- The project page for "SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim Verification on Scientific Tables"☆23Dec 21, 2023Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Web archiving utility library☆11Mar 11, 2026Updated last month
- The offical code for paper "What Constitutes a Faithful Summary? Preserving Author Perspectives in News Summarization"☆10Jun 23, 2024Updated last year
- Science of Science☆183Feb 16, 2026Updated 2 months ago
- Neuralized version of the Reference String Parser component of the ParsCit package.☆81May 27, 2022Updated 3 years ago
- ☆12Jan 20, 2025Updated last year
- SciWING is a modern toolkit for scientific document processing from WING-NUS☆63May 1, 2023Updated 2 years ago
- MultiCite code and data. Models are available on Huggingface.☆33May 10, 2022Updated 3 years ago
- A dataset of fine-grained knowledge graphs of scientific claims☆16Sep 24, 2021Updated 4 years ago
- The guts for computing data for OpenAlex. For more, see https://openalex.org/.☆152Mar 6, 2026Updated last month
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- SciGen☆24Aug 10, 2021Updated 4 years ago
- Official implementation of the ACL 2024: Scientific Inspiration Machines Optimized for Novelty☆94Apr 13, 2024Updated 2 years ago
- Formalization of Arithmetization of Mathematics/Metamathematics☆14Mar 8, 2025Updated last year
- ☆758May 22, 2023Updated 2 years ago
- A set of scripts to grab public datasets from resources related to arXiv☆477May 20, 2024Updated last year
- A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF …☆70Nov 7, 2020Updated 5 years ago
- This is the Github repo of "CODA-19: Using a Non-Expert Crowd to Annotate Research Aspects on 10,000+ Abstracts in the COVID-19 Open Rese…☆38Oct 7, 2021Updated 4 years ago