A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network
☆301Sep 28, 2024Updated last year
Alternatives and similar repositories for unarXive
Users that are interested in unarXive are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Repository for NAACL 2019 paper on Citation Intent prediction☆130Dec 1, 2019Updated 6 years ago
- S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/☆1,062Apr 26, 2024Updated 2 years ago
- Dataset and model in the paper "SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation"☆13Feb 14, 2022Updated 4 years ago
- Official dataset repository for "SciReviewGen: A Large-scale Dataset for Automatic Literature Review Generation."☆21Jun 4, 2023Updated 3 years ago
- Code for the Master Thesis "Enhancing the Microsoft Academic Knowledge Graph"☆14Sep 28, 2020Updated 5 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- This repository provides details and links to the ACL anthology corpus/collection including .bib, .pdf and grobid extractions of the pdfs☆193Oct 12, 2023Updated 2 years ago
- Pretraining Efficiently on S2ORC!☆186Oct 23, 2024Updated last year
- Measuring the Evolution of a Scientific Field through Citation Frames☆64Oct 5, 2018Updated 7 years ago
- SPECTER: Document-level Representation Learning using Citation-informed Transformers☆582Jun 12, 2023Updated 3 years ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆66Jul 8, 2024Updated last year
- Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)☆469Apr 11, 2024Updated 2 years ago
- ☆20Jan 18, 2022Updated 4 years ago
- Can LLMs generate code-mixed sentences through zero-shot prompting?☆11Apr 18, 2023Updated 3 years ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆180Mar 18, 2023Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆21Jan 15, 2024Updated 2 years ago
- code for generating a high-quality knowledge graph with metadata about datasets and links to publications☆28Apr 8, 2022Updated 4 years ago
- ScienceMeter: Tracking Scientific Knowledge Updates in Language Models☆17Jun 28, 2025Updated 11 months ago
- Finding mentions and citations to named and implicit research datasets from within the academic literature☆30Jun 14, 2025Updated last year
- Data and code for the paper "CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding"☆14Sep 8, 2022Updated 3 years ago
- ☆53Feb 21, 2025Updated last year
- Science Parse parses scientific papers (in PDF form) and returns them in structured form.☆700May 26, 2024Updated 2 years ago
- True Few-Shot BioIE: Benchmarking GPT-3 In-Context and Small PLM Fine-Tuning☆12Jul 6, 2022Updated 3 years ago
- Code for Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals.☆17Apr 25, 2021Updated 5 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- link raw affiliation to ROR ids☆31Mar 3, 2026Updated 3 months ago
- The project page for "SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim Verification on Scientific Tables"☆23Dec 21, 2023Updated 2 years ago
- Web archiving utility library☆11May 5, 2026Updated last month
- Dataset accompanying the SPECTER model☆148Dec 19, 2022Updated 3 years ago
- The offical code for paper "What Constitutes a Faithful Summary? Preserving Author Perspectives in News Summarization"☆10Jun 23, 2024Updated last year
- ☆12Jan 20, 2025Updated last year
- Neuralized version of the Reference String Parser component of the ParsCit package.