IllDepence/unarXive

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/IllDepence/unarXive)

IllDepence / unarXive

A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network

☆305

Alternatives and similar repositories for unarXive

Users that are interested in unarXive are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

allenai / s2orc
View on GitHub
S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/
☆1,075Apr 26, 2024Updated 2 years ago
sairin1202 / SciXGen
View on GitHub
Dataset and model in the paper "SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation"
☆13Feb 14, 2022Updated 4 years ago
tetsu9923 / SciReviewGen
View on GitHub
Official dataset repository for "SciReviewGen: A Large-scale Dataset for Automatic Literature Review Generation."
☆21Jun 4, 2023Updated 3 years ago
lin-ao / enhancing_the_makg
View on GitHub
Code for the Master Thesis "Enhancing the Microsoft Academic Knowledge Graph"
☆14Sep 28, 2020Updated 5 years ago
allenai / peS2o
View on GitHub
Pretraining Efficiently on S2ORC!
☆187Oct 23, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
davidjurgens / citation-function
View on GitHub
Measuring the Evolution of a Scientific Field through Citation Frames
☆63Oct 5, 2018Updated 7 years ago
WING-NUS / SciAssist
View on GitHub
☆20Feb 17, 2024Updated 2 years ago
allenai / specter
View on GitHub
SPECTER: Document-level Representation Learning using Citation-informed Transformers
☆586Jun 12, 2023Updated 3 years ago
leopoldwhite / GraphDancer
View on GitHub
GraphDancer: Training LLMs to Explore and Reason over Graphs via Curriculum Reinforcement Learning
☆20May 25, 2026Updated 2 months ago
qtli / GSM-Plus
View on GitHub
GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.
☆66Jul 8, 2024Updated 2 years ago
chang-github-00 / LLM-Predictive-Decoding
View on GitHub
☆16Jul 9, 2025Updated last year
UKPLab / SciGen
View on GitHub
☆21Jan 18, 2022Updated 4 years ago
allenai / s2orc-doc2json
View on GitHub
Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)
☆473Apr 11, 2024Updated 2 years ago
Rojak-NLP / LLM-Code-Mixing
View on GitHub
Can LLMs generate code-mixed sentences through zero-shot prompting?
☆11Apr 18, 2023Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
allenai / vila
View on GitHub
Incorporating VIsual LAyout Structures for Scientific Text Classification
☆180Mar 18, 2023Updated 3 years ago
copenlu / cite-worth
View on GitHub
Data and code for the paper "CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding"
☆14Sep 8, 2022Updated 3 years ago
michaelfaerber / data-set-knowledge-graph
View on GitHub
code for generating a high-quality knowledge graph with metadata about datasets and links to publications
☆28Apr 8, 2022Updated 4 years ago
allenai / S2AFF
View on GitHub
link raw affiliation to ROR ids
☆31Mar 3, 2026Updated 4 months ago
allenai / science-parse
View on GitHub
Science Parse parses scientific papers (in PDF form) and returns them in structured form.
☆702May 26, 2024Updated 2 years ago
dki-lab / few-shot-bioIE
View on GitHub
True Few-Shot BioIE: Benchmarking GPT-3 In-Context and Small PLM Fine-Tuning
☆12Jul 6, 2022Updated 4 years ago
kermitt2 / datastet
View on GitHub
Finding mentions and citations to named and implicit research datasets from within the academic literature
☆31Jun 14, 2025Updated last year
siftech / SciClaim
View on GitHub
A dataset of fine-grained knowledge graphs of scientific claims
☆17Sep 24, 2021Updated 4 years ago
Yale-LILY / ReasTAP
View on GitHub
Data and Code for EMNLP 2022 paper "ReasTAP: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples"
☆15Jun 4, 2023Updated 3 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
commoncrawl / ia-web-commons
View on GitHub
Web archiving utility library
☆11Updated this week
cuinfoscience / INFO5613-Fall2021
View on GitHub
INFO 5613 Network Science
☆22Oct 28, 2021Updated 4 years ago
ariecattan / SciCo
View on GitHub
Code for the paper SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts (AKBC 2021). https://openreview.net/forum?id=OF…
☆30Oct 17, 2021Updated 4 years ago
yikee / FLIP
View on GitHub
Small Reward Models via Backward Inference
☆21May 25, 2026Updated 2 months ago
allenai / aries
View on GitHub
Aligned, Review-Informed Edits of Scientific Papers
☆55Jul 5, 2023Updated 3 years ago
lyh6560new / P3Sum
View on GitHub
The offical code for paper "What Constitutes a Faithful Summary? Preserving Author Perspectives in News Summarization"
☆10Jun 23, 2024Updated 2 years ago
allenai / multicite
View on GitHub
MultiCite code and data. Models are available on Huggingface.
☆37May 10, 2022Updated 4 years ago
donglixp / ICL_PaperList
View on GitHub
Paper List for In-context Learning 🌷
☆19Jan 3, 2023Updated 3 years ago
EagleW / Scientific-Inspiration-Machines-Optimized-for-Novelty
View on GitHub
Official implementation of the ACL 2024: Scientific Inspiration Machines Optimized for Novelty
☆95Apr 13, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
allenai / scitldr
View on GitHub
☆759May 22, 2023Updated 3 years ago
FormalizedFormalLogic / Arithmetization
View on GitHub
Formalization of Arithmetization of Mathematics/Metamathematics
☆14Mar 8, 2025Updated last year
allenai / mup
View on GitHub
☆18Oct 22, 2022Updated 3 years ago
mattbierbaum / arxiv-public-datasets
View on GitHub
A set of scripts to grab public datasets from resources related to arXiv
☆479May 20, 2024Updated 2 years ago
ckorzen / pdf-text-extraction-benchmark
View on GitHub
A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF …
☆73Nov 7, 2020Updated 5 years ago
yikee / ScienceMeter
View on GitHub
ScienceMeter: Tracking Scientific Knowledge Updates in Language Models, COLM 2026
☆17Jun 28, 2025Updated last year
SiyuanWangw / StepwiseQA
View on GitHub
The code of Paper "Locate Then Ask: Interpretable Stepwise Reasoning for Multi-hop Question Answering".
☆22Sep 1, 2022Updated 3 years ago