mattbierbaum / arxiv-public-datasets
A set of scripts to grab public datasets from resources related to arXiv
☆427Updated 9 months ago
Alternatives and similar repositories for arxiv-public-datasets:
Users that are interested in arxiv-public-datasets are comparing it to the libraries listed below
- SPECTER: Document-level Representation Learning using Citation-informed Transformers☆533Updated last year
- S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/☆872Updated 9 months ago
- A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network☆285Updated 4 months ago
- A python module to scrape arxiv.org for a date range and category☆292Updated last year
- Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)☆356Updated 10 months ago
- Dataset accompanying the SPECTER model☆130Updated 2 years ago
- Tools for extracting tables and results from Machine Learning papers☆400Updated 2 years ago
- A Visual Analysis Tool to Explore Learned Representations in Transformers Models☆586Updated last year
- Python wrapper for the arXiv API☆1,187Updated 7 months ago
- Data and models for the SciFact verification task.☆226Updated last year
- The full dataset behind paperswithcode.com☆333Updated 3 years ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆175Updated last year
- Python client for GROBID Web services☆308Updated 3 weeks ago
- Science Parse parses scientific papers (in PDF form) and returns them in structured form.☆644Updated 8 months ago
- Science-parse version 2☆236Updated 5 years ago
- Self-Supervision for Named Entity Disambiguation at the Tail☆215Updated 2 years ago
- SciRepEval benchmark training and evaluation scripts☆72Updated 9 months ago
- Autoregressive Entity Retrieval☆781Updated last year
- multimodal document analysis☆162Updated 8 months ago
- Get answers to research questions from 200M+ papers. Link to demo -☆205Updated last year
- The SOTA extractor pipeline☆316Updated 11 months ago
- Data and Code for ICLR2020 Paper "TabFact: A Large-scale Dataset for Table-based Fact Verification"☆385Updated last year
- A BERT model for scientific text.☆1,560Updated 2 years ago
- The Semantic Scholar Search Reranker☆104Updated 4 years ago
- Pretraining Efficiently on S2ORC!☆156Updated 3 months ago
- Collection of public APIs for embedding scientific papers☆56Updated 4 years ago
- Interpretable Evaluation for AI Systems☆361Updated last year
- The corresponding code from our paper "DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations". Do not hesitate to o…☆381Updated last year
- UnifiedQA: Crossing Format Boundaries With a Single QA System☆431Updated 2 years ago
- Extracting scientific claims from biomedical abstracts (powered by AllenNLP)☆141Updated 3 years ago